Page MenuHomePhabricator

Migrate mentor/mentee relationship to a separate database table on Wikimedia wikis
Open, MediumPublicThu, Apr 29

Description

Background

To facilitate works on mentor dashboard, we need to move mentor/mentee relationship to a separate table. For more details, see T275773: Move mentor/mentee relationship to a separate database table to make it possible to run more queries on it.

Task objective

Do the mentor/mentee relationship migration on Wikimedia wikis.

Checklist
  • Set migration stage to SCHEMA_COMPAT_WRITE_OLD | SCHEMA_COMPAT_READ_OLD both on beta and prod (no-op; 678386)
  • Set migration stage to SCHEMA_COMPAT_WRITE_BOTH | SCHEMA_COMPAT_READ_OLD on beta and make sure relationship is written to both places (678389)
  • Run mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php on all beta wikis with GE installed
  • Set migration stage to SCHEMA_COMPAT_WRITE_BOTH | SCHEMA_COMPAT_READ_NEW on beta and watch for breakages
  • Wait for T278573: Create growthexperiments_mentor_mentee database table on extension1 for wikis in growthexperiments.dblist to be completed
  • Wait for the maintenance script to get merged
  • Wait for 1.36-wmf.1 to be everywhere (https://versions.toolforge.org/)
  • Set migration stage to SCHEMA_COMPAT_WRITE_BOTH | SCHEMA_COMPAT_READ_OLD on testwiki and make sure mentor/mentee relationship data is written to database (680302)
  • Run mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=testwiki and make sure user_properties-powered data and growthexperiments_mentor_mentee data match (at least the number of rows)
  • Set migration stage to SCHEMA_COMPAT_WRITE_BOTH | SCHEMA_COMPAT_READ_NEW on testwiki and make sure there are no obvious breakages (680303). Alert the QA engineer.
  • Set migration stage to SCHEMA_COMPAT_WRITE_BOTH | SCHEMA_COMPAT_READ_OLD on all production wikis and make sure mentor/mentee relationship data is written to database (680304)
  • Run mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php on all wikis in growthexperiments.dblist in production
  • Set migration stage to SCHEMA_COMPAT_WRITE_BOTH | SCHEMA_COMPAT_READ_NEW on production
  • Wait a week to ensure there are no breakages (until 2021-04-28)
  • Set migration stage to SCHEMA_COMPAT_WRITE_NEW | SCHEMA_COMPAT_READ_NEW both in beta and production (this is the point of no return)
  • Change the extension-provided default to SCHEMA_COMPAT_WRITE_NEW | SCHEMA_COMPAT_READ_NEW
  • Wait for the change to be train-deployed everywhere
  • Stop setting migration stage in operations/mediawiki-config

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 678386 merged by jenkins-bot:

[operations/mediawiki-config@master] Explicitly set wgGEMentorshipMigrationStage: WRITE_OLD/READ_OLD

https://gerrit.wikimedia.org/r/678386

Change 678389 merged by jenkins-bot:

[operations/mediawiki-config@master] labs: Set GEMentorshipMigrationStage to SCHEMA_COMPAT_WRITE_BOTH | SCHEMA_COMPAT_READ_OLD

https://gerrit.wikimedia.org/r/678389

Mentioned in SAL (#wikimedia-operations) [2021-04-12T11:19:20Z] <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: NO-OP: 6c03d6a59086fa42ec4fc9d289c819a4d3b8e052: Explicitly set wgGEMentorshipMigrationStage: WRITE_OLD/READ_OLD (T279853) (duration: 00m 58s)

Mentioned in SAL (#wikimedia-releng) [2021-04-12T15:40:49Z] <Urbanecm> deployment-prep: urbanecm@deployment-deploy01:~$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=cswiki # T279853

Mentioned in SAL (#wikimedia-releng) [2021-04-12T15:46:52Z] <Urbanecm> deployment-prep: Run mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php on all beta wikis with GrowthExperiments installed (wikis that are both in all-labs and growthexperiments, plus enwiki; T279853)

Script was OK for cswiki, it sounds to match for arwiki as well:

urbanecm@deployment-deploy01:~$ sql arwiki
MariaDB [arwiki]> select count(*) from user_properties where up_property="growthexperiments-mentor-id";
+----------+
| count(*) |
+----------+
|      113 |
+----------+
1 row in set (0.00 sec)

MariaDB [arwiki]>
urbanecm@deployment-deploy01:~$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=arwiki --dry-run
... processing users from 1 to 200
Would update 87 rows...
... processing users from 201 to 400
Would update 26 rows...
Script finished
---------------------
Would update 113 rows
urbanecm@deployment-deploy01:~$

Executed it for the rest of the beta wikis as well. It takes a while for enwiki.

Change 678613 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/mediawiki-config@master] labs: Set mentor migration stage to SCHEMA_COMPAT_WRITE_BOTH | SCHEMA_COMPAT_READ_NEW

https://gerrit.wikimedia.org/r/678613

Change 678613 merged by jenkins-bot:

[operations/mediawiki-config@master] labs: Set mentor migration stage to SCHEMA_COMPAT_WRITE_BOTH | SCHEMA_COMPAT_READ_NEW

https://gerrit.wikimedia.org/r/678613

Urbanecm_WMF updated the task description. (Show Details)
Urbanecm_WMF updated the task description. (Show Details)
Urbanecm_WMF set Due Date to Thu, Apr 22, 10:00 PM.
Urbanecm_WMF changed the subtype of this task from "Task" to "Deadline".
kostajh triaged this task as Medium priority.Apr 15 2021, 8:01 AM

Change 680302 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/mediawiki-config@master] testwiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD

https://gerrit.wikimedia.org/r/680302

Change 680303 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/mediawiki-config@master] testwiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_NEW

https://gerrit.wikimedia.org/r/680303

Change 680304 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/mediawiki-config@master] wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD everywhere

https://gerrit.wikimedia.org/r/680304

Change 680302 merged by jenkins-bot:

[operations/mediawiki-config@master] testwiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD

https://gerrit.wikimedia.org/r/680302

Mentioned in SAL (#wikimedia-operations) [2021-04-19T11:10:05Z] <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: 03f8ed819091624f5ae4a8d7ed3631dc322fabcd: testwiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD (T279853) (duration: 00m 57s)

Mentioned in SAL (#wikimedia-operations) [2021-04-19T11:11:23Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=testwiki # T279853

Mentioned in SAL (#wikimedia-operations) [2021-04-19T11:27:48Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=testwiki --force # T279853

Change 681063 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/mediawiki-config@master] cswiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD

https://gerrit.wikimedia.org/r/681063

Change 681063 merged by jenkins-bot:

[operations/mediawiki-config@master] cswiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD

https://gerrit.wikimedia.org/r/681063

So, the testwiki migration worked, there is the expected number of rows in user_properties and the new table. Trying cswiki now.

Mentioned in SAL (#wikimedia-operations) [2021-04-19T12:34:29Z] <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: 3e3cce192f1e99cbcae739f234271411d10974ac: cswiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD (T279853) (duration: 00m 58s)

Mentioned in SAL (#wikimedia-operations) [2021-04-19T12:38:43Z] <Urbanecm> mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=cswiki # T279853

Change 680303 merged by jenkins-bot:

[operations/mediawiki-config@master] testwiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_NEW

https://gerrit.wikimedia.org/r/680303

Mentioned in SAL (#wikimedia-operations) [2021-04-19T12:51:32Z] <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: ef0f68e2a9c1c638911bb06c47ba6e8ef88ee393: testwiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_NEW (T279853) (duration: 00m 57s)

Change 680304 merged by jenkins-bot:

[operations/mediawiki-config@master] wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD everywhere

https://gerrit.wikimedia.org/r/680304

Mentioned in SAL (#wikimedia-operations) [2021-04-19T12:58:46Z] <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: bd076306c0ae0428ff13743f499b2a02d42b6eab: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD everywhere (T279853) (duration: 00m 57s)

Mentioned in SAL (#wikimedia-operations) [2021-04-20T18:34:16Z] <Urbanecm> mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=idwiki # T279853

idwiki migration worked fine. Number of rows matches:

wikiadmin@10.64.48.111(idwiki)> select count(*) from growthexperiments_mentor_mentee;
+----------+
| count(*) |
+----------+
|     5725 |
+----------+
1 row in set (0.00 sec)

wikiadmin@10.64.48.111(idwiki)>

wikiadmin@10.64.0.204(idwiki)> select count(*) from user_properties where up_property="growthexperiments-mentor-id";
+----------+
| count(*) |
+----------+
|     5725 |
+----------+
1 row in set (0.02 sec)

wikiadmin@10.64.0.204(idwiki)>

and the warning from T280525 did not appear either. I'll run it on the remaining wikis.

Mentioned in SAL (#wikimedia-operations) [2021-04-20T20:13:03Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=bnwiki # T279853

Mentioned in SAL (#wikimedia-operations) [2021-04-20T20:15:30Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=euwiki # T279853

Mentioned in SAL (#wikimedia-operations) [2021-04-20T20:16:46Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=frwiktionary # T279853

Mentioned in SAL (#wikimedia-operations) [2021-04-20T20:18:33Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=hewiki # T279853

Mentioned in SAL (#wikimedia-operations) [2021-04-20T20:21:00Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=hrwiki # T279853

Just out of curious, i ran one of the wikis with time:

Script finished
---------------------
Updated 14980 rows

real    2m48.395s
user    0m57.332s
sys     0m9.400s

Migration continues, so far, no issues noticed.

Mentioned in SAL (#wikimedia-operations) [2021-04-20T20:22:37Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=huwiki # T279853

Mentioned in SAL (#wikimedia-operations) [2021-04-20T20:27:43Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=hywiki # T279853

Mentioned in SAL (#wikimedia-operations) [2021-04-20T20:29:54Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=rowiki # T279853

Mentioned in SAL (#wikimedia-operations) [2021-04-20T20:30:54Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=srwiki # T279853

Mentioned in SAL (#wikimedia-operations) [2021-04-20T20:32:43Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=svwiki # T279853

Mentioned in SAL (#wikimedia-operations) [2021-04-20T20:34:33Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=tewiki # T279853

Mentioned in SAL (#wikimedia-operations) [2021-04-20T20:36:14Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=ukwiki # T279853

Mentioned in SAL (#wikimedia-operations) [2021-04-20T20:41:22Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=viwiki # T279853

Mentioned in SAL (#wikimedia-operations) [2021-04-20T20:52:57Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=ruwiki # T279853

So, I'm done with the migrations for the day. I noticed something weird when I ran ruwiki (the biggest migrated wiki in terms of rows copied so far):

wikiadmin@10.64.32.114(ruwiki)> select count(*) from user_properties where up_property="growthexperiments-mentor-id";
+----------+
| count(*) |
+----------+
|    79087 |
+----------+
1 row in set (0.04 sec)

wikiadmin@10.64.48.111(ruwiki)> select count(*) from growthexperiments_mentor_mentee;
+----------+
| count(*) |
+----------+
|    79089 |
+----------+
1 row in set (0.02 sec)

The number of rows is supposed to match. I'll dump the two queries on stat1005, and investigate a bit.

My stat1005 fiddling:

[urbanecm@stat1005 ~/tmp]$ echo 'select * from user_properties where up_property="growthexperiments-mentor-id";' | analytics-mysql ruwiki > old_properties.txt
[urbanecm@stat1005 ~/tmp]$ echo 'select * from growthexperiments_mentor_mentee' | analytics-mysql ruwiki --use-x1 > new_table.txt
[urbanecm@stat1005 ~/tmp]$ cut -f 1 new_table.txt > new_mentees.txt
[urbanecm@stat1005 ~/tmp]$ cut -f 1 old_properties.txt > old_mentees.txt
[urbanecm@stat1005 ~/tmp]$ sed -i 1d old_mentees.txt
[urbanecm@stat1005 ~/tmp]$ sed -i 1d new_mentees.txt
[urbanecm@stat1005 ~/tmp]$ sort -n new_mentees.txt > new_sorted_mentees.txt
[urbanecm@stat1005 ~/tmp]$ sort -n old_mentees.txt > old_sorted_mentees.txt
[urbanecm@stat1005 ~/tmp]$ git diff old_sorted_mentees.txt new_sorted_mentees.txt
diff --git a/old_sorted_mentees.txt b/new_sorted_mentees.txt
index 1c8da75..6da82d7 100644
--- a/old_sorted_mentees.txt
+++ b/new_sorted_mentees.txt
@@ -78687,6 +78687,7 @@
 2973053
 2973055
 2973058
+2973061
 2973063
 2973065
 2973066
@@ -78722,6 +78723,7 @@
 2973131
 2973132
 2973133
+2973134
 2973135
 2973137
 2973139
[urbanecm@stat1005 ~/tmp]$ analytics-mysql ruwiki
mysql:research@dbstore1005.eqiad.wmnet [ruwiki]> select count(*) from user_properties where up_user in (2973061,2973134) and up_property="growthexperiments-mentor-id";
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.001 sec)

mysql:research@dbstore1005.eqiad.wmnet [ruwiki]> Bye
[urbanecm@stat1005 ~/tmp]$ analytics-mysql ruwiki --use-x1
mysql:research@dbstore1005.eqiad.wmnet [ruwiki]> select count(*) from growthexperiments_mentor_mentee where gemm_mentee_id  in (2973061,2973134);
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.001 sec)

mysql:research@dbstore1005.eqiad.wmnet [ruwiki]> Bye

So, two users are in the new mentor/mentee store, but are not in the old one. @Tgr, do you have any idea why that might happen? I was thinking that locking the rows by the maintenance script might be the reason behind this, but I'm unable to find a matching error in logstash.

Also, I'm wondering whether this is a reason to worry. It's a data inconsistency, but as soon as the users in question load their homepage for the second time (with migration stage still being READ_OLD), we'll automatically give them a new mentor, and update the DB table. If they access homepage after switching migration stage to READ_NEW, they'll keep using whoever they have assigned in the DB table, while the properties-powered information will never be actually used again.

I'll wait with running the script on more wikis for Tgr's comment.

I agree small-scale inconsistencies are not worth blocking the migration as mentor data is currently not used for anything that important. Even if a user's mentor would change it would be confusing but not terrible, and the discrepancies seen here are not user-visible at all.

As to why it happens: the migration script is incapable of writing a growthexperiments_mentor_mentee record without a matching user_properties record, and we don't ever delete mentors, so I guess it is random partial failure in MultiWriteMentorStore setting a mentor. Either it happens in a GET context (so the two writes run in separate jobs, and one can fail without affecting the other, e.g. due to CAS error), or the main DB connection gets rolled back for some reason and the x1 one doesn't.

Mentioned in SAL (#wikimedia-operations) [2021-04-21T13:18:53Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ time mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=frwiki # T279853

Mentioned in SAL (#wikimedia-operations) [2021-04-21T13:54:51Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ time mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=fawiki # T279853

Mentioned in SAL (#wikimedia-operations) [2021-04-21T15:15:38Z] <Urbanecm> urbanecm@mwmaint1002:~$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php # T279853

Script finished everywhere, with the number of rows mostly matching (there were few of other discrepancies, similar to the ruwiki one). We should be ready for READ_NEW everywhere.

Change 681750 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/mediawiki-config@master] Set wgGEMentorshipMigrationStage to WRITE_BOTH/READ_NEW everywhere

https://gerrit.wikimedia.org/r/681750

Change 681750 merged by jenkins-bot:

[operations/mediawiki-config@master] Set wgGEMentorshipMigrationStage to WRITE_BOTH/READ_NEW everywhere

https://gerrit.wikimedia.org/r/681750

Mentioned in SAL (#wikimedia-operations) [2021-04-21T18:14:08Z] <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: 1ae5ca5467fad7bfdae8aa94b241fe6c048ab8e5: Set wgGEMentorshipMigrationStage to WRITE_BOTH/READ_NEW everywhere (T279853) (duration: 00m 59s)

Urbanecm_WMF added a subscriber: Etonkovidova.

@Etonkovidova Hello Elena, please do ping me if there is anything wrong with mentorship features. Migration was completed, and the new database table is now the source of truth. I'll wait a week before stopping writing to the old mentorship store.

Urbanecm_WMF changed Due Date from Thu, Apr 22, 10:00 PM to Thu, Apr 29, 10:00 PM.

Change 683430 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/mediawiki-config@master] Set wgGEMentorshipMigrationStage to SCHEMA_COMPAT_NEW everywhere

https://gerrit.wikimedia.org/r/683430

It would be nice to deploy SCHEMA_COMPAT_NEW before Wednesday as the x1 migration will cause the DB store to be readonly while the pref store is still writable. In theory a readonly error would cause both connections to be rolled back so they would not get out of sync, and in theory even if they do get out sync it shouldn't cause any issue - still, one less potential (if unlikely) source of trouble.

It would be nice to deploy SCHEMA_COMPAT_NEW before Wednesday as the x1 migration will cause the DB store to be readonly while the pref store is still writable. In theory a readonly error would cause both connections to be rolled back so they would not get out of sync, and in theory even if they do get out sync it shouldn't cause any issue - still, one less potential (if unlikely) source of trouble.

I originally planned to do it last week, but I failed to. The patch should be safe, and I hope to deploy it on Monday or Tuesday.

Change 683430 merged by jenkins-bot:

[operations/mediawiki-config@master] Set wgGEMentorshipMigrationStage to SCHEMA_COMPAT_NEW everywhere

https://gerrit.wikimedia.org/r/683430

Mentioned in SAL (#wikimedia-operations) [2021-05-03T11:11:10Z] <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: c5a7c67b4daf33e0f9aaabec3f35ab6d4184894b: Set wgGEMentorshipMigrationStage to SCHEMA_COMPAT_NEW everywhere (T279853) (duration: 00m 57s)

It would be nice to deploy SCHEMA_COMPAT_NEW before Wednesday as the x1 migration will cause the DB store to be readonly while the pref store is still writable. In theory a readonly error would cause both connections to be rolled back so they would not get out of sync, and in theory even if they do get out sync it shouldn't cause any issue - still, one less potential (if unlikely) source of trouble.

I originally planned to do it last week, but I failed to. The patch should be safe, and I hope to deploy it on Monday or Tuesday.

✅ This should now be done. I checked it at cswiki (changed my own user's mentor), and it worked – user properties were just left alone.