Page MenuHomePhabricator

Wire up DatabasePropertyTermStore in WikibaseRepo
Closed, ResolvedPublic

Event Timeline

Change 512984 had a related patch set uploaded (by Alaa Sarhan; owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] Wire up DatabasePropertyTermStore in WikibaseRepo

https://gerrit.wikimedia.org/r/512984

while working on this, and testing locally the rebuild script on some properties I previously imported from Wikidata (using Importer extension) the following text appeared in a term of P2

વ્યક્તિનું મુખ્ય કાર્ય ક્ષેત્ર (ભૌતિકવિજ્ઞાન, ઈતિહાસ), વ્યવસાય નહિ (ભૌતિકવિજ્ઞાની, ઈતિહાસવિદ્...તેથી જુઓ ગુણધર્મ:P૧૦૬)

That text is 308 bytes using strlen (118 characters using mb_strlen). It failed to insert with db complaining that it is too long to store in wbx_text VARBINARY(255) in wbt_text table.

I wonder how those are stored currently in wb_terms table. We have to fix this (= decide what to do with those cases) before migration can happen in production anyway.

Change 513110 had a related patch set uploaded (by Alaa Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] Wire up PropertyTermStore in WikiebaseRepo

https://gerrit.wikimedia.org/r/513110

while working on this, and testing locally the rebuild script on some properties I previously imported from Wikidata (using Importer extension) the following text appeared in a term of P2

વ્યક્તિનું મુખ્ય કાર્ય ક્ષેત્ર (ભૌતિકવિજ્ઞાન, ઈતિહાસ), વ્યવસાય નહિ (ભૌતિકવિજ્ઞાની, ઈતિહાસવિદ્...તેથી જુઓ ગુણધર્મ:P૧૦૬)

That text is 308 bytes using strlen (118 characters using mb_strlen). It failed to insert with db complaining that it is too long to store in wbx_text VARBINARY(255) in wbt_text table.

I wonder how those are stored currently in wb_terms table. We have to fix this (= decide what to do with those cases) before migration can happen in production anyway.

It’s truncated:

MariaDB [wikidatawiki_p]> SELECT term_text FROM wb_terms WHERE term_full_entity_id = 'P101' AND term_language = 'gu' AND term_type = 'description';
+------------------------------------------------------------+
| term_text                                                  |
+------------------------------------------------------------+
| વ્યક્તિનું મુખ્ય કાર્ય ક્ષેત્ર (ભૌતિકવિજ્ઞાન, ઈતિહાસ), વ્યવસાય નહિ (ભૌતિકવિજ્ઞાની, ઈતિહાસવિદ્...ત |
+------------------------------------------------------------+
1 row in set (0.04 sec)

We might not even explicitly truncate in Wikibase – we don’t run MariaDB in “strict mode” (see T108255), so any overlong values are just silently truncated. Or perhaps we do truncate in Wikibase, I didn’t check yet.

Change 513158 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] Wire up DatabasePropertyTermStore in WikibaseRepo

https://gerrit.wikimedia.org/r/513158

Change 512984 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Fix for utf8 texts, using StringNormalizer.

https://gerrit.wikimedia.org/r/512984

We might not even explicitly truncate in Wikibase – we don’t run MariaDB in “strict mode” (see T108255), so any overlong values are just silently truncated. Or perhaps we do truncate in Wikibase, I didn’t check yet.

Looks like we do not truncate cause I could reproduce the issue locally, in which the first attempt to insert it it error with a message 'value is too long'.

Not quite sure then how this should be handled. I can think of two options here:

  • Make wbx_text column bigger.
  • Truncate programmatically before trying to find values to avoid false-negatives when Acquirer is searching for values before inserting them.

Change 513110 abandoned by Alaa Sarhan:
Wire up PropertyTermStore in WikiebaseRepo

Reason:
in favor of If5fb399c9dafb59dbd39669f7dcf360fcee15100

https://gerrit.wikimedia.org/r/513110

Change 513158 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Wire up DatabasePropertyTermStore in WikibaseRepo

https://gerrit.wikimedia.org/r/513158

Closing – we should probably open another task for the truncation issue, though.