Page MenuHomePhabricator

Increase the size of wbt_text_in_lang.wbxl_language
Closed, ResolvedPublic

Description

Follow up on T232393: Find out why rebuilding some items in new term store failed
We can't store any terms on new term store when we have terms in 'zh-classical' language because the limit for size is ten and wbt_text_in_lang.wbxl_language should not be more than 10. This needs to be fixed ASAP as it's a blocker of turning on reading the new term store.

We agreed on 20 as the limit

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

After this is done we need to rebuild anything that has terms in 'zh-classicial' or 'nl-informal'

Change 547730 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Increase the size of wbt_text_in_lang.wbxl_language

https://gerrit.wikimedia.org/r/547730

Change 547730 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Increase the size of wbt_text_in_lang.wbxl_language

https://gerrit.wikimedia.org/r/547730

Mentioned in SAL (#wikimedia-releng) [2019-11-01T16:40:04Z] <Amir1> ladsgroup@deployment-deploy01:~$ mwscript sql.php --wiki=wikidatawiki /srv/mediawiki-staging/php-master/extensions/Wikibase/repo/sql/increase_wbxl_language.sql (T237102)

After talking to @Amire80 it seems that we need to drop zh-classical in favor of lzh. They are not being used much which means community is already cleaning it up:

mysql:research@s8-analytics-replica.eqiad.wmnet [wikidatawiki]> select * from wb_terms where term_language = 'zh-classical' limit 55;
+-------------+----------------+---------------------+------------------+---------------+-------------+--------------------------------------------------------------------------------------+-----------------+-------------+
| term_row_id | term_entity_id | term_full_entity_id | term_entity_type | term_language | term_type   | term_text                                                                            | term_search_key | term_weight |
+-------------+----------------+---------------------+------------------+---------------+-------------+--------------------------------------------------------------------------------------+-----------------+-------------+
|  2995932324 |              0 | Q1321               | item             | zh-classical  | description | 印歐一語,隸羅曼語族,本源西班牙,通於拉美列國,字以羅馬                             |                 |           0 |
|  2995934271 |              0 | Q334351             | item             | zh-classical  | description | 清帝,年號道光                                                                       |                 |           0 |
|  2995933714 |              0 | Q35                 | item             | zh-classical  | description | 北歐一國,都哥本哈根                                                                 |                 |           0 |
|  2995931300 |              0 | Q45                 | item             | zh-classical  | description | 南歐一國,都里斯本                                                                   |                 |           0 |
|  2995933519 |              0 | Q55                 | item             | zh-classical  | description | 西歐一國,都阿姆斯特丹,實都海牙                                                     |                 |           0 |
|  2987203583 |              0 | Q65924886           | item             | zh-classical  | label       | 關聖帝君戒淫經                                                                       |                 |           0 |
|  3050087841 |              0 | Q71582643           | item             | zh-classical  | label       | 分類:待選卓著                                                                        |                 |           0 |
|  3056413184 |              0 | Q72699512           | item             | zh-classical  | label       | 模板:IPA pulmonic consonants/table                                                   |                 |           0 |
|  3056419466 |              0 | Q72700587           | item             | zh-classical  | label       | 模板:IPA chart/core2                                                                 |                 |           0 |
|  3056452840 |              0 | Q72706479           | item             | zh-classical  | label       | 模板:IPA vowels/table                                                                |                 |           0 |
|  3056453724 |              0 | Q72706663           | item             | zh-classical  | label       | 模板:IPA vowels/styles.css                                                           |                 |           0 |
|  3057879044 |              0 | Q72942543           | item             | zh-classical  | label       | 維基大典:投票/汝同意刪除文言文維基否                                                 |                 |           0 |
|  3058451089 |              0 | Q73033088           | item             | zh-classical  | label       | 模板:~w                                                                              |                 |           0 |
+-------------+----------------+---------------------+------------------+---------------+-------------+--------------------------------------------------------------------------------------+-----------------+-------------+
13 rows in set (0.00 sec)

(If you change your language to zh-classical, it goes to lzh instead. I'm certain it's not a valid language code)

OTOH, we can't drop nl-informal because it's a valid language code but it's not used "much":

mysql:research@s8-analytics-replica.eqiad.wmnet [wikidatawiki]> select * from wb_terms where term_language = 'nl-informal' limit 55;
+-------------+----------------+---------------------+------------------+---------------+-----------+----------------------------------------------------------------------------------------------------------------+-----------------+-------------+
| term_row_id | term_entity_id | term_full_entity_id | term_entity_type | term_language | term_type | term_text                                                                                                      | term_search_key | term_weight |
+-------------+----------------+---------------------+------------------+---------------+-----------+----------------------------------------------------------------------------------------------------------------+-----------------+-------------+
|  3053621138 |              0 | Q72165760           | item             | nl-informal   | alias     | mijn broer in Washington spreekt nederlands en chinees mijn ouders zijn wel eens in het plaatsje oving geweest |                 |           0 |
|  3053621137 |              0 | Q72165760           | item             | nl-informal   | label     | nicolaas herman oving washington                                                                               |                 |           0 |
+-------------+----------------+---------------------+------------------+---------------+-----------+----------------------------------------------------------------------------------------------------------------+-----------------+-------------+
2 rows in set (0.00 sec)

I cleaned up all of usages of zh-classical now. Please note that it's a redirect to lzh in mediawiki and in frontend people can't add a term in zh-classical (I tried) but they can add it through API. Given that nl-informal is used but it's used only in Q72165760 I don't think there's anything preventing us from turning on reading from the new store for items up to let's say Q1k though.

Thoughts @alaa_wmde @Addshore?

Will leave this on the campsite verify column until the schema change is done?

Will leave this on the campsite verify column until the schema change is done?

Yeah, let's leave it here for a bit.

Addshore changed the task status from Open to Stalled.Jan 7 2020, 10:55 AM

Stalled waiting on the production failover

Addshore triaged this task as Medium priority.Jan 14 2020, 4:16 PM
Marostegui changed the task status from Stalled to Open.Sep 3 2020, 9:42 AM
Marostegui subscribed.

The master was altered finally, so this is unblocked

@Addshore @Ladsgroup how would I confirm this is done?

on local instance:

MariaDB [(none)]> use repo;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [repo]> desc wbt_text_in_lang;
+---------------+------------------+------+-----+---------+----------------+
| Field         | Type             | Null | Key | Default | Extra          |
+---------------+------------------+------+-----+---------+----------------+
| wbxl_id       | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| wbxl_language | varbinary(20)    | NO   | MUL | NULL    |                |
| wbxl_text_id  | int(10) unsigned | NO   | MUL | NULL    |                |
+---------------+------------------+------+-----+---------+----------------+
3 rows in set (0.003 sec)

the wbxl_language should varbinary 20 and not 10

On production:

amsa@amsa-Latitude-7480:~$ ssh mwmaint2001.codfw.wmnet
<blah blah blah>
ladsgroup@mwmaint2001:~$ sql wikidatawiki
<blah blah blah>

wikiadmin@10.192.48.86(wikidatawiki)> desc wbt_text_in_lang;
+---------------+------------------+------+-----+---------+----------------+
| Field         | Type             | Null | Key | Default | Extra          |
+---------------+------------------+------+-----+---------+----------------+
| wbxl_id       | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| wbxl_language | varbinary(20)    | NO   | MUL | NULL    |                |
| wbxl_text_id  | int(10) unsigned | NO   | MUL | NULL    |                |
+---------------+------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

Preferably double checking this on testwikidatawiki as well

thanks @Ladsgroup. I could have asked more clearly. I was trying to ask if the structure of the table on wikidata/test wikidata can be looked at only by dialing in to production servers, or are those tables maybe replicated somewhere where mortals like me could look at them. I should have remembered they are not replicated, and your answer seems to confirm this.
Could you please run the describe on test wikidata's database for me as well? This would be the last step to confirm this task is done.

Sure:

wikiadmin@10.192.32.183(testwikidatawiki)> desc wbt_text_in_lang;
+---------------+------------------+------+-----+---------+----------------+
| Field         | Type             | Null | Key | Default | Extra          |
+---------------+------------------+------+-----+---------+----------------+
| wbxl_id       | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| wbxl_language | varbinary(20)    | NO   | MUL | NULL    |                |
| wbxl_text_id  | int(10) unsigned | NO   | MUL | NULL    |                |
+---------------+------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)