Page MenuHomePhabricator

Add code to write to change_tag_def table
Closed, ResolvedPublic

Description

The second part of normalizing change_tag schema: T185355: Normalize change tag schema This has to be done in a configurable way:

  • Set $wgChangeTagsSchemaMigrationStage to MIGRATION_WRITE_BOTH. This will cause the change_tag_def table and the ct_tag_id field to be written to when an edit is tagged, but not yet read from.

Event Timeline

Ladsgroup created this task.May 4 2018, 1:29 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 4 2018, 1:29 PM

Set $wgChangeTagsSchemaMigrationStage to MIGRATION_WRITE_BOTH. This will cause the change_tag_def table and the ct_tag_id field to be written to when an edit is tagged, but not yet read from.

Well, the MIGRATION_WRITE_BOTH documentation in Defines.php describes this flag as “read the new schema preferentially, falling back to the old”… is it okay if we diverge from that a bit?

Change 434814 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/core@master] Add code to write to change_tag_def table

https://gerrit.wikimedia.org/r/434814

Ladsgroup moved this task from Tasks to Needs Review on the Wikidata-Ministry-Of-Magic board.
Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptMay 23 2018, 11:22 PM

Change 434814 merged by jenkins-bot:
[mediawiki/core@master] Add code to write to change_tag_def table

https://gerrit.wikimedia.org/r/434814

I enabled it on beta cluster and I'm seeing a funny problem there.
This is the recent change_tag entries:

wikiadmin@deployment-db04[enwiki]> select * from change_tag order by ct_id desc limit 50;
+--------+----------+-----------+-----------+-----------------------+-----------+-----------+
| ct_id  | ct_rc_id | ct_log_id | ct_rev_id | ct_tag                | ct_params | ct_tag_id |
+--------+----------+-----------+-----------+-----------------------+-----------+-----------+
| 214222 |   628760 |      NULL |    380872 | mw-rollback           | NULL      |        47 |
| 214221 |   628734 |      NULL |    380871 | visualeditor          | NULL      |        32 |
| 214220 |   628733 |      NULL |    380870 | visualeditor          | NULL      |        32 |
| 214219 |   628732 |      NULL |    380869 | visualeditor          | NULL      |        32 |
| 214218 |   628731 |      NULL |    380868 | visualeditor          | NULL      |        32 |
| 214217 |   628730 |      NULL |    380867 | visualeditor          | NULL      |        32 |
| 214216 |   628729 |      NULL |    380866 | visualeditor          | NULL      |        32 |
| 214215 |   628727 |      NULL |    380864 | visualeditor          | NULL      |        32 |
| 214214 |   628726 |      NULL |    380863 | visualeditor          | NULL      |        32 |
| 214213 |   628724 |      NULL |    380862 | visualeditor          | NULL      |        32 |
| 214212 |   628723 |      NULL |    380861 | visualeditor-wikitext | NULL      |        37 |
| 214211 |   628721 |      NULL |    380860 | visualeditor          | NULL      |        32 |
| 214210 |   628718 |      NULL |    380859 | visualeditor          | NULL      |        32 |
| 214209 |   628710 |      NULL |    380858 | visualeditor          | NULL      |        32 |
| 214208 |   628709 |      NULL |    380857 | visualeditor          | NULL      |        32 |
| 214207 |   628706 |      NULL |    380856 | visualeditor          | NULL      |        32 |
| 214206 |   628706 |      NULL |    380856 | mw-blank              | NULL      |        31 |
| 214205 |   628705 |      NULL |    380855 | mobile web edit       | NULL      |         2 |
| 214204 |   628705 |      NULL |    380855 | mobile edit           | NULL      |         1 |
| 214203 |   628704 |      NULL |    380854 | mobile web edit       | NULL      |         2 |
| 214202 |   628704 |      NULL |    380854 | mobile edit           | NULL      |         1 |
| 214201 |   628703 |      NULL |    380853 | mobile web edit       | NULL      |         2 |
| 214200 |   628703 |      NULL |    380853 | mobile edit           | NULL      |         1 |
| 214199 |   628702 |      NULL |    380852 | mobile web edit       | NULL      |         2 |
| 214198 |   628702 |      NULL |    380852 | mobile edit           | NULL      |         1 |
| 214197 |   628701 |    132658 |    380851 | mobile web edit       | NULL      |         2 |
| 214196 |   628701 |    132658 |    380851 | mobile edit           | NULL      |         1 |
| 214195 |   628701 |    132658 |    380851 | mw-new-redirect       | NULL      |        17 |
| 214194 |   628700 |    132657 |    380850 | mobile web edit       | NULL      |         2 |
| 214193 |   628700 |    132657 |    380850 | mobile edit           | NULL      |         1 |
| 214192 |   628700 |    132657 |    380850 | mw-new-redirect       | NULL      |        17 |
| 214191 |   628699 |      NULL |    380849 | mobile web edit       | NULL      |         2 |
| 214190 |   628699 |      NULL |    380849 | mobile edit           | NULL      |         1 |
| 214189 |   628698 |      NULL |    380848 | mobile web edit       | NULL      |         2 |
| 214188 |   628698 |      NULL |    380848 | mobile edit           | NULL      |         1 |
| 214187 |   628697 |      NULL |    380847 | mobile web edit       | NULL      |         2 |
| 214186 |   628697 |      NULL |    380847 | mobile edit           | NULL      |         1 |
| 214185 |   628696 |      NULL |    380846 | mobile web edit       | NULL      |         2 |
| 214184 |   628696 |      NULL |    380846 | mobile edit           | NULL      |         1 |
| 214183 |   628695 |      NULL |    380845 | mobile web edit       | NULL      |         2 |
| 214182 |   628695 |      NULL |    380845 | mobile edit           | NULL      |         1 |
| 214181 |   628694 |    132656 |    380844 | mobile web edit       | NULL      |         2 |
| 214180 |   628694 |    132656 |    380844 | mobile edit           | NULL      |         1 |
| 214179 |   628693 |      NULL |    380843 | mobile web edit       | NULL      |         2 |
| 214178 |   628693 |      NULL |    380843 | mobile edit           | NULL      |         1 |
| 214177 |   628692 |    132655 |    380842 | mobile web edit       | NULL      |         2 |
| 214176 |   628692 |    132655 |    380842 | mobile edit           | NULL      |         1 |
| 214175 |   628691 |      NULL |    380841 | mw-rollback           | NULL      |      NULL |

And this is change_tag_def table:

wikiadmin@deployment-db04[enwiki]> select * from change_tag_def;
+--------+-----------------------+------------------+-----------+
| ctd_id | ctd_name              | ctd_user_defined | ctd_count |
+--------+-----------------------+------------------+-----------+
|      1 | mobile edit           |                0 |        14 |
|      2 | mobile web edit       |                0 |        14 |
|     17 | mw-new-redirect       |                0 |         2 |
|     31 | mw-blank              |                0 |         1 |
|     32 | visualeditor          |                0 |        14 |
|     37 | visualeditor-wikitext |                0 |         1 |
|     47 | mw-rollback           |                0 |         1 |
+--------+-----------------------+------------------+-----------+

the ctd_id jumps to number of change_tag entries when it's defined and not by one. So the third row in change_tag_def has id of 17 instead of 3 because the first and the second rows have been used 16 times before the third gets introduced. This is fantastic but a real bug. Let me fix it.

TTO added a comment.EditedJun 8 2018, 10:41 AM

And why did the last row in change_tag have ct_tag_id NULL? It seems probable that ct_tag_id will eventually become part of the primary key, so I'm not sure why it is even allowed to be NULL. (never mind, I guess this is necessary for the window between the schema change and running the script)

Okay, It's because upsert increases AUTO_INCREMENT even when it ends up doing update only. This is used for improving concurrency (makes sense) and there is no easy way around it without losing too much. (Some links: https://www.reddit.com/r/PostgreSQL/comments/4bapat/upserts_increase_sequence_number/ https://stackoverflow.com/questions/3679611/mysql-upsert-and-auto-increment-causes-gaps) I'm undecided now. cc. @Catrope @daniel @tstarling

And why did the last row in change_tag have ct_tag_id NULL? It seems probable that ct_tag_id will eventually become part of the primary key, so I'm not sure why it is even allowed to be NULL.

This is needed during the migration process. The row that had NULL was added before setting $wgChangeTagsSchemaMigrationStag to MIGRATION_WRITE_BOTH (See https://gerrit.wikimedia.org/r/#/c/438060/) once everything is in place and backpopulated the change_tag table. We make the column non-nullable.

@Ladsgroup use NameTableStore...

I've been thinking about using NameTableStore for a while and looking at the code, it's amazing but it's hard to use it for this particular case. Because change_tag_def need to keep hitcount and if I could get away with not using hitcount, I'd definitely go with NameTableStore and (we wouldn't need change_tag_def table at all) but with this usecase in mind, the only I can think of using the names table is to store id of the change tag (in the name table) as string in change_tag_def->ctd_tag and do all the upsert magic again. Do you have a way to do it and also keeping the hitcount in mind?

daniel added a comment.Jun 8 2018, 1:08 PM

@Ladsgroup perhaps use the NameTableStore only for the IDs. Then, when updating, you do two steps:

  1. $tagId = $nameStore->acquireId( $tagName )
  2. UPDATE change_tag_def SET ctd_count = ctd_count +1 WHERE ctd_id = $tagId

I habn't tried this, this is just of the top of my head. Do you think it could work? It would avoid the use of upsert, since acquireId would guarantee that the row is there.

@daniel: acquireId can't do this because when inserting a new row, you need to determine ctd_user_defined, otherwise it doesn't let you do it but I can go with getId and upsert it when NameTableAccessException is thrown.

daniel added a comment.Jun 8 2018, 1:31 PM

if you don't use acquireId, then it's pointless.

can't you just always create the row with ctd_user_defined = 0, and change it to 1 afterwards in case that is needed?

Ladsgroup added a comment.EditedJun 8 2018, 1:43 PM

It is possible but we need to make ctd_user_defined defaults to zero which is not hard to implement in production (the table is empty everywhere atm) the only thing here is that schema change patches for core is not my favorite type of patch to make.

Anomie added a comment.Jun 8 2018, 1:55 PM

Or you could just do it more explicitly: start an atomic section, do a select with FOR UPDATE to determine which rows already exist, then update or insert as appropriate.

Or you could just do it more explicitly: start an atomic section, do a select with FOR UPDATE to determine which rows already exist, then update or insert as appropriate.

What I really love about NameTableStore is that it has a nice WAN cache wrapped around it that reduces reads on this table by one order of magnitude (at least) on master. This is pretty useful at the end.

Anomie added a comment.Jun 8 2018, 8:04 PM

Assuming you change whatever existing logic to use it for reads too, sure. Since you only partially updated the code for the new table so far, it's hard to see that.

Change 439568 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/core@master] Introduce argument for insert callback in NameTableStore

https://gerrit.wikimedia.org/r/439568

Change 439568 merged by jenkins-bot:
[mediawiki/core@master] Introduce argument for insert callback in NameTableStore

https://gerrit.wikimedia.org/r/439568

Change 440367 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/core@master] Make ChangeTag use NameTableStore for change_tag_def table

https://gerrit.wikimedia.org/r/440367

Change 440367 merged by jenkins-bot:
[mediawiki/core@master] Make ChangeTag use NameTableStore for change_tag_def table

https://gerrit.wikimedia.org/r/440367

Lydia_Pintscher closed this task as Resolved.Jun 20 2018, 3:10 PM
Lydia_Pintscher moved this task from Test (Verification) to Done on the Wikidata-Campsite board.
Vvjjkkii renamed this task from Add code to write to change_tag_def table to 3ldaaaaaaa.Jul 1 2018, 1:11 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed Ladsgroup as the assignee of this task.
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
CommunityTechBot renamed this task from 3ldaaaaaaa to Add code to write to change_tag_def table.Jul 2 2018, 4:28 PM
CommunityTechBot closed this task as Resolved.
CommunityTechBot assigned this task to Ladsgroup.
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added subscribers: gerritbot, Aklapper.