Page MenuHomePhabricator

MediaInfo extension should not use the wb_terms table
Closed, ResolvedPublic

Description

After creating some captions on commons on beta I went to check the wb_terms table and it is indeed being used.

MariaDB [commonswiki]> select * from wb_terms;
+-------------+----------------+---------------------+------------------+---------------+-----------+--------------------------------------+-----------------+-------------+
| term_row_id | term_entity_id | term_full_entity_id | term_entity_type | term_language | term_type | term_text                            | term_search_key | term_weight |
+-------------+----------------+---------------------+------------------+---------------+-----------+--------------------------------------+-----------------+-------------+
|           1 |              0 | M59928              | mediainfo        | en            | label     | Make a caption                       |                 |           0 |
|           2 |              0 | M59928              | mediainfo        | fr            | label     | Seulement une légende                |                 |           0 |
|           3 |              0 | M58796              | mediainfo        | en            | label     | Procedurally generated crystal in 3D |                 |           0 |
|           4 |              0 | M59928              | mediainfo        | es            | label     | another caption                      |                 |           0 |
+-------------+----------------+---------------------+------------------+---------------+-----------+--------------------------------------+-----------------+-------------+
4 rows in set (0.00 sec)

Lexeme has been implemented without using this table.

Related is T198866 which lists the usecases within wikibase itself that we want to migrate away from.

Quote from IRC:

<duesen> Daniel Kinzler addshore: MediaInfo has no uniqueness constraints
5:10 PM we will want "label lookups" at some point

So it sounds like this might accidently be getting used right now while it is not needed?

Event Timeline

Erm ... why not? "Terms" are labels, description and aliases, afaik.

Erm ... why not? "Terms" are labels, description and aliases, afaik.

Indeed, but the table is badly designed and doesn't scale.

Hmmm ok. Its use of that table was in place when I got here, I think, might take a little while to figure out how to extract it (edit: and since Matthias's baby arrived I am now the only dev working fulltime on this). Is this necessary for going to production?

Hmmm ok. Its use of that table was in place when I got here, I think, might take a little while to figure out how to extract it (edit: and since Matthias's baby arrived I am now the only dev working fulltime on this). Is this necessary for going to production?

I guess you could go to production with this, but you would have to start thinking about migrating away from it as a priority.
It would be worth poking the DBA s, if you decide to move to production with this.

The table should see a similarly linear growth as that of the wb_terms table on wikidatawiki, although i guess there are some different characteristics:

  • captions will generally be longer than labels?
  • The rate of increase in row count could be greater than that on wikidatawiki, as youll be creating a bunch of new entities quickly during the migration.

I would prefer if we don't enable more stuff in production that uses wb_term table, that table is already in a _very_ bad state and we should stop using it as soon as we can.
So getting more stuff to use this table is probably not what we want to do, as it will be another road in the way of getting rid of this huge table.

I would prefer if we don't enable more stuff in production that uses wb_term table, that table is already in a _very_ bad state and we should stop using it as soon as we can.
So getting more stuff to use this table is probably not what we want to do, as it will be another road in the way of getting rid of this huge table.

One thing to note is this will be commonswiki.wb_terms rather than wikdatawiki.wb_terms, so the existing table won't get bigger with MediaInfo, however we will have a new table for commonswiki that will be growing.

I would prefer if we don't enable more stuff in production that uses wb_term table, that table is already in a _very_ bad state and we should stop using it as soon as we can.
So getting more stuff to use this table is probably not what we want to do, as it will be another road in the way of getting rid of this huge table.

One thing to note is this will be commonswiki.wb_terms rather than wikdatawiki.wb_terms, so the existing table won't get bigger with MediaInfo, however we will have a new table for commonswiki that will be growing.

Good point - however, it is still the same thing. Something that will needed to be migrated away.
So I would prefer if things are done rightly from the start so we can "forget" about it, rather than having another thing to keep in mind whilst getting rid of wb_terms

I've briefly looked into our code around this, and I'm afraid I don't see it as remotely plausible for us to move WBMI away from the table in the next few months' work; we're blocked on Wikibase moving. WBMI doesn't have any code that decides where our data goes (it's all done by Wikibase itself, as they're "just" labels). Consequently we'll have to postpone this work. :-(

So, it doesn't look like WBMI actually uses the data in the table.
If that is the case then we can likely put something in place to prevent the table being written to.

Change 483376 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@master] EntityHandler, don't write to TermIndex for all entity types

https://gerrit.wikimedia.org/r/483376

Change 483376 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] EntityHandler, don't write to TermIndex for all entity types

https://gerrit.wikimedia.org/r/483376

Change 483388 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@wmf/1.33.0-wmf.9] EntityHandler, don't write to TermIndex for all entity types

https://gerrit.wikimedia.org/r/483388

Change 483389 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@wmf/1.33.0-wmf.12] EntityHandler, don't write to TermIndex for all entity types

https://gerrit.wikimedia.org/r/483389

These will be backported in a slot before the deployment of media info to real commons today

Change 483388 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@wmf/1.33.0-wmf.9] EntityHandler, don't write to TermIndex for all entity types

https://gerrit.wikimedia.org/r/483388

Mentioned in SAL (#wikimedia-operations) [2019-01-10T15:12:42Z] <addshore@deploy1001> Synchronized php-1.33.0-wmf.9/extensions/Wikibase/repo/includes/Content: [[gerrit:483388|T208330 dont write to wb_terms for mediainfo]] (duration: 00m 55s)

Change 483389 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@wmf/1.33.0-wmf.12] EntityHandler, don't write to TermIndex for all entity types

https://gerrit.wikimedia.org/r/483389

Mentioned in SAL (#wikimedia-operations) [2019-01-10T15:20:55Z] <addshore@deploy1001> Synchronized php-1.33.0-wmf.12/extensions/Wikibase/repo/includes/Content: [[gerrit:483388|T208330 dont write to wb_terms for mediainfo]] (duration: 00m 54s)

Mentioned in SAL (#wikimedia-operations) [2019-01-10T15:24:45Z] <addshore> T208330, MariaDB [testcommonswiki]> TRUNCATE TABLE wb_terms; # Was https://phabricator.wikimedia.org/P7973

Addshore claimed this task.

The extension no longer writes into the table

This is only going to be Resolved for a couple of weeks until we enable Properties; fixing that is tracked in T208425.

This is only going to be Resolved for a couple of weeks until we enable Properties; fixing that is tracked in T208425.

The good thing about properties is, well, there won't be many rows in the table :)