Page MenuHomePhabricator

Make mentee overview module's updateMenteeData.php scale better
Closed, ResolvedPublic

Description

Background

The Growth team is currently (September 2021) building the mentor dashboard, which allows mentors to view information about their mentees, such as, their number of (reverted) edits. Because calculating those numbers take long time, the data is only updated daily (sequentially for all mentor dashboard enabled wikis).

Problem

If we enable mentor dashboard backend on all Growth wikis, the updating script will very likely take more than 24 hours to finish. That's because we're now at English Wikipedia and other large wikis.

It's not a good idea to run a script taking >24 hours to run daily. As such, before deploying to all Growth wikis, the updateMenteeData.php script should be changed to run faster.

Possible solutions

One solution would be to run multiple wikis in parallel (running different DB sections in parallel should be perfectly okay, as it's the database queries that take long -- the script does not take mwmaint's resources).

We can also make updateMenteeData.php run faster, either by optimizing the SQL queries themselves if possible, or by making other changes to its code. For instance, currently, it likely calculates last active timestamp twice, while it could only happen once for each update.

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+0 -5
operations/puppetproduction+1 -1
mediawiki/extensions/GrowthExperimentswmf/1.38.0-wmf.3+29 -20
mediawiki/extensions/GrowthExperimentswmf/1.38.0-wmf.3+19 -3
mediawiki/extensions/GrowthExperimentswmf/1.38.0-wmf.3+10 -0
mediawiki/extensions/GrowthExperimentswmf/1.38.0-wmf.4+19 -3
mediawiki/extensions/GrowthExperimentswmf/1.38.0-wmf.4+10 -0
mediawiki/extensions/GrowthExperimentsmaster+29 -20
mediawiki/extensions/GrowthExperimentsmaster+19 -3
mediawiki/extensions/GrowthExperimentsmaster+10 -0
mediawiki/extensions/GrowthExperimentsmaster+15 -1
mediawiki/extensions/GrowthExperimentswmf/1.38.0-wmf.3+97 -7
mediawiki/extensions/GrowthExperimentswmf/1.38.0-wmf.3+2 -2
mediawiki/extensions/GrowthExperimentsmaster+2 -2
mediawiki/extensions/GrowthExperimentsmaster+97 -7
mediawiki/extensions/GrowthExperimentsmaster+0 -1
mediawiki/extensions/GrowthExperimentsmaster+41 -21
mediawiki/extensions/GrowthExperimentswmf/1.38.0-wmf.3+41 -21
operations/puppetproduction+10 -0
operations/mediawiki-configmaster+3 -1
mediawiki/extensions/GrowthExperimentswmf/1.38.0-wmf.1+23 -1
mediawiki/extensions/GrowthExperimentsmaster+23 -1
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Please move back to in progress once this is CR'ed, as I will need to think about more ways how this can be made faster, or measure impact at least.

Change 723177 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] updateMenteeData: Do not calculate last edit timestamp twice

https://gerrit.wikimedia.org/r/723177

Change 723781 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.1] updateMenteeData: Do not calculate last edit timestamp twice

https://gerrit.wikimedia.org/r/723781

Change 723781 merged by Urbanecm:

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.1] updateMenteeData: Do not calculate last edit timestamp twice

https://gerrit.wikimedia.org/r/723781

Mentioned in SAL (#wikimedia-operations) [2021-09-27T14:36:55Z] <urbanecm@deploy1002> Synchronized php-1.38.0-wmf.1/extensions/GrowthExperiments/: 08f1e73: 3b154db: GrowthExperiments backports (T290609, T291658) (duration: 00m 58s)

Mentioned in SAL (#wikimedia-operations) [2021-09-27T14:36:55Z] <urbanecm@deploy1002> Synchronized php-1.38.0-wmf.1/extensions/GrowthExperiments/: 08f1e73: 3b154db: GrowthExperiments backports (T290609, T291658) (duration: 00m 58s)

This mostly helped (most wikis runs faster at least by a hundred of seconds; arwiki is the only exception).

Runtime before-change:

[urbanecm@mwmaint1002 /var/log/mediawiki/mediawiki_job_growthexperiments-updateMenteeData]$ grep 'Sep 27' syslog.log | grep Done | sed 's/.*\[11559\]: //g'
arwiki:  Done. Took 12136 seconds.
bnwiki:  Done. Took 614 seconds.
cswiki:  Done. Took 2623 seconds.
testwiki:  Done. Took 6 seconds.
viwiki:  Done. Took 1024 seconds.
[urbanecm@mwmaint1002 /var/log/mediawiki/mediawiki_job_growthexperiments-updateMenteeData]$

Runtime after change:

[urbanecm@mwmaint1002 ~]$ /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php --statsd > updateMenteeData.log
[urbanecm@mwmaint1002 ~]$ grep Done updateMenteeData.log
arwiki:  Done. Took 13628 seconds.
bnwiki:  Done. Took 179 seconds.
cswiki:  Done. Took 788 seconds.
testwiki:  Done. Took 3 seconds.
viwiki:  Done. Took 909 seconds.
[urbanecm@mwmaint1002 ~]$

Boldly closing.

Eh, still not enough. The change works though. I still need to figure out how to make all wikis run within 24 hours (especially with arwiki, frwiki and enwiki being Growth wikis).

Runtimes by day for pilot wikis

DaywikiSeconds
Sep 26arwiki14851
Sep 26bnwiki623
Sep 26cswiki2429
Sep 26viwiki993
Sep 27arwiki12136
Sep 27bnwiki614
Sep 27cswiki2623
Sep 27viwiki1024
Sep 28arwiki13370
Sep 28bnwiki180
Sep 28cswiki787
Sep 28viwiki1225
Sep 29arwiki10223
Sep 29bnwiki180
Sep 29cswiki711
Sep 29viwiki1243
Sep 30arwiki11545
Sep 30bnwiki176
Sep 30cswiki794
Sep 30viwiki945

I did the change on Sep 27, but after the regular update ran, so Sep 28 is the first day with change. As you can see, this helped significantly for cswiki, but not so much for bigger wikis like arwiki. I'll do a test run for frwiki just to see how fast it is now (last run was about 48 hours).

Change 725263 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/mediawiki-config@master] Let DB expressions intersect DB lists

https://gerrit.wikimedia.org/r/725263

Change 725264 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/puppet@production] WIP: growthexperiments: Run updateMenteeData.php in parallel

https://gerrit.wikimedia.org/r/725264

Change 725286 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/puppet@production] growthexperiments: Remove absented systemd job

https://gerrit.wikimedia.org/r/725286

Mentioned in SAL (#wikimedia-operations) [2021-10-01T09:37:59Z] <urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=frwiki --force # to get an idea about timing for T290609, runs in a tmux session under my account

Change 725263 merged by jenkins-bot:

[operations/mediawiki-config@master] Let DB expressions intersect DB lists

https://gerrit.wikimedia.org/r/725263

Mentioned in SAL (#wikimedia-operations) [2021-10-04T11:55:02Z] <urbanecm@deploy1002> Synchronized multiversion/MWWikiversions.php: 508cf5cc6d213373f7c9ba1cdef142ebc8398022: Let DB expressions intersect DB lists (T290609) (duration: 00m 58s)

I'll do a test run for frwiki just to see how fast it is now (last run was about 48 hours).

[urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=frwiki --force
Done. Took 149313 seconds.
[urbanecm@mwmaint1002 ~]$

So..."only" 41 hours. Still way more than I'd want it to :/.

Have you profiled to see which steps are slow? At a very quick glance, is there a reason the select/insert/update of mentee_data aren't batched?

Have you profiled to see which steps are slow?

I have to admit i didn't do an actual profiling, I do "know" which parts of the script are slow from pre-coding experiments in analytics replicas with raw SQL, I just didn't expect them to be this slow. The queries in UncachedMenteeOverviewDataProvider::getBlocksForUsers are the slowest part of the script. Effectively, that function runs something like this:

SELECT  user_id,COUNT(log_id) AS `blocks`  FROM `user` LEFT JOIN (SELECT  user_id AS `blocked_user`,log_id  FROM `logging` JOIN `user` ON ((REPLACE(log_title, "_", " ")=user_name))   WHERE log_type = 'block' AND log_action = 'block' AND user_id IN (@userIds@)   ) `blocks_made` ON ((user_id=blocked_user))   WHERE user_id IN (@userIds@)

where @userIds@ is a list of user IDs we want to inspect (mentees assigned to a particular mentor). This can be shown via running it twice, once with the blocks disabled (cswiki):

[urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki
Done. Took 705 seconds.
[urbanecm@mwmaint1002 ~]$ sudo -u mwdeploy vim /srv/mediawiki/php-1.38.0-wmf.2/extensions/GrowthExperiments/includes/MentorDashboard/MenteeOverview/UncachedMenteeOverviewDataProvider.php # disable the blocks function
[urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki
Done. Took 262 seconds.
[urbanecm@mwmaint1002 ~]$ scap pull
17:00:01 Copying from deploy1002.eqiad.wmnet to mwmaint1002.eqiad.wmnet
17:00:01 Started rsync common
17:00:07 Finished rsync common (duration: 00m 05s)
17:00:07 Started scap-cdb-rebuild
17:00:07 Finished scap-cdb-rebuild (duration: 00m 00s)
17:00:07 Checking if php-fpm restart needed
[urbanecm@mwmaint1002 ~]$

I'm not sure how to make a query that uses a JOIN on a REPLACE faster -- I don't see the blockee in any other column :(.

Rest of the queries are supposed to be relatively simple (and make use of indexes), see the notebook I prepared when thinking about how I can generate the data. It doesn't represent the current state of the UncachedMenteeOverviewDataProvider exactly anymore (there were slight changes, such as implementation of the "edited less than 6 months ago" rule, and the notebook's purpose was to draft SQL queries before writing MW code, getting a rough idea about timing).

I thought the difficulty of the query (queries) would depend on number of mentees (arwiki: ~232k and ~2 hours, frwiki ~187k and ~42 hours), which...is apparently not quite true.

I can surely gather more data about weak points of the script if you think that'd be helpful. I'm not sure if it's a better idea to measure queries in a notebook, or add (semi)permanent profiling code into the script itself -- advice would be appreciated.

A way might be to run most numbers daily (or maybe even once every 12h), and blocks weekly/once every three days/whatever, but that's also in a category of "workarounding" the issue rather than making it actually go faster.

At a very quick glance, is there a reason the select/insert/update of mentee_data aren't batched?

There are batches of 200 mentees (before we wait for replication), inserted/updated each via a single insert/update. I'm not sure what's better: doing multiple insert/update queries, or a single query, updating many rows. I still feel the actual computation takes most of the time (of course, that's just feeling).

The plan for that query is actually not too bad:

mysql:research@dbstore1007.eqiad.wmnet [cswiki]> EXPLAIN SELECT  user_id,COUNT(log_id) AS `blocks`  FROM `user` LEFT JOIN (SELECT  user_id AS `blocked_user`,log_id  FROM `logging` JOIN `user` ON ((REPLACE(log_title, "_", " ")=user_name))   WHERE log_type = 'block' AND log_action = 'block' AND user_id IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)   ) `blocks_made` ON ((user_id=blocked_user))   WHERE user_id IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20);
+------+-------------+---------+--------+-------------------------------+-----------------+---------+---------------------+--------+--------------------------+
| id   | select_type | table   | type   | possible_keys                 | key             | key_len | ref                 | rows   | Extra                    |
+------+-------------+---------+--------+-------------------------------+-----------------+---------+---------------------+--------+--------------------------+
|    1 | SIMPLE      | user    | range  | PRIMARY                       | PRIMARY         | 4       | NULL                | 20     | Using where; Using index |
|    1 | SIMPLE      | user    | eq_ref | PRIMARY,user_name             | PRIMARY         | 4       | cswiki.user.user_id | 1      | Using where              |
|    1 | SIMPLE      | logging | ref    | log_type_action,log_type_time | log_type_action | 68      | const,const         | 169114 | Using where              |
+------+-------------+---------+--------+-------------------------------+-----------------+---------+---------------------+--------+--------------------------+

It can be simplified a bit:

mysql:research@dbstore1007.eqiad.wmnet [cswiki]> EXPLAIN SELECT user_id, COUNT(log_id) AS `blocks`  FROM `user` LEFT JOIN `logging` ON (REPLACE(log_title, "_", " ")=user_name AND log_type = 'block' AND log_action = 'block') WHERE user_id IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20) GROUP BY user_id;
+------+-------------+---------+-------+-------------------------------+-----------------+---------+-------------+--------+-------------+
| id   | select_type | table   | type  | possible_keys                 | key             | key_len | ref         | rows   | Extra       |
+------+-------------+---------+-------+-------------------------------+-----------------+---------+-------------+--------+-------------+
|    1 | SIMPLE      | user    | range | PRIMARY                       | PRIMARY         | 4       | NULL        | 20     | Using where |
|    1 | SIMPLE      | logging | ref   | log_type_action,log_type_time | log_type_action | 68      | const,const | 169114 | Using where |
+------+-------------+---------+-------+-------------------------------+-----------------+---------+-------------+--------+-------------+

but the difference seems marginal.

One possible improvement is adding user_namespace=2 to the join so the log_page_time index can be used (the only index which contains log_title). But probably the simpler approach is to just use a separate query to turn user IDs into usernames, do the underscore transformation in PHP, and then you end up with a query which matches the index:

mysql:research@dbstore1007.eqiad.wmnet [cswiki]> EXPLAIN SELECT log_title AS username, COUNT(log_id) AS blocks FROM logging WHERE log_namespace = 2 AND log_type = 'block' AND log_action = 'block' AND log_title IN ('Foo', 'Bar', 'Baz') GROUP BY log_title;
+------+-------------+---------+-------+---------------------------------------------+---------------+---------+------+------+------------------------------------+
| id   | select_type | table   | type  | possible_keys                               | key           | key_len | ref  | rows | Extra                              |
+------+-------------+---------+-------+---------------------------------------------+---------------+---------+------+------+------------------------------------+
|    1 | SIMPLE      | logging | range | log_type_action,log_type_time,log_page_time | log_page_time | 261     | NULL | 3    | Using index condition; Using where |
+------+-------------+---------+-------+---------------------------------------------+---------------+---------+------+------+------------------------------------+

...although I suppose this depends on the real-world distribution of data. Filtering via log_type_action means the DB engine needs to go through every block log entry. Filtering via log_page_time means it needs to go through every log entry of any type for the given user. I'd expect the latter to be much more performant, but maybe there are anomalous users with tons of log entries where it could be slow?

Change 726755 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@master] Mentee overview: Make UncachedMenteeOverviewDataProvider::getBlocksForUsers faster

https://gerrit.wikimedia.org/r/726755

[...] I'd expect the latter to be much more performant, but maybe there are anomalous users with tons of log entries where it could be slow?

There might be such users, but since total number of blocks per wiki can easily go to millions, I'd say it's safe to say "there will be less rows using @Tgr's approach":

mysql:research@dbstore1003.eqiad.wmnet [arwiki]> select count(*) from logging where log_type='block';
+----------+
| count(*) |
+----------+
|   210918 |
+----------+
1 row in set (0.131 sec)

mysql:research@dbstore1005.eqiad.wmnet [frwiki]> select count(*) from logging where log_type='block';
+----------+
| count(*) |
+----------+
|  3891872 |
+----------+
1 row in set (2.171 sec)

mysql:research@dbstore1003.eqiad.wmnet [enwiki]> select count(*) from logging where log_type='block';
+----------+
| count(*) |
+----------+
|  8543015 |
+----------+
1 row in set (2.820 sec)

mysql:research@dbstore1003.eqiad.wmnet [enwiki]>

This might also answer my question from few comments above: why frwiki takes significantly longer time than arwiki: frwiki has less mentees, but significantly more (total) blocks.

Uploaded a patch. Tested locally it produces the same data -- once merged, I'll test it in prod, too.

Change 725264 merged by Jbond:

[operations/puppet@production] growthexperiments: Run updateMenteeData.php in parallel

https://gerrit.wikimedia.org/r/725264

Change 726755 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Mentee overview: Make UncachedMenteeOverviewDataProvider::getBlocksForUsers faster

https://gerrit.wikimedia.org/r/726755

Change 727498 had a related patch set uploaded (by Gergő Tisza; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.3] Mentee overview: Make UncachedMenteeOverviewDataProvider::getBlocksForUsers faster

https://gerrit.wikimedia.org/r/727498

Change 727498 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.3] Mentee overview: Make UncachedMenteeOverviewDataProvider::getBlocksForUsers faster

https://gerrit.wikimedia.org/r/727498

Mentioned in SAL (#wikimedia-operations) [2021-10-08T00:05:40Z] <tgr@deploy1002> Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments: Backport: [[gerrit:727498|Mentee overview: Make UncachedMenteeOverviewDataProvider::getBlocksForUsers faster (T290609)]] (duration: 00m 56s)

tgr@mwmaint1002:~$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki --mentor 'Martin Urbanec'
Done. Took 59 seconds.
tgr@mwdebug1001:~$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki --mentor 'Martin Urbanec'
Done. Took 25 seconds.

Change 728259 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@master] UncachedMenteeOverviewDataProvider::getFilteredMenteesForMentor: Cast IDs to ints

https://gerrit.wikimedia.org/r/728259

That's quite an improvement! Thanks for the review, backport and test.

I just noticed that getFilteredMenteesForMentor() returns strings too (via slow query log in tendril), which makes the query to actor run significantly lower:

mysql:research@dbstore1007.eqiad.wmnet [cswiki]> select count(*) from actor where actor_user in (...);
+----------+
| count(*) |
+----------+
|      223 |
+----------+
1 row in set (0.003 sec)

mysql:research@dbstore1007.eqiad.wmnet [cswiki]> select count(*) from actor where actor_user in (..., '123');
+----------+
| count(*) |
+----------+
|      224 |
+----------+
1 row in set (3.357 sec)

Uploading a patch to fix that. IIRC database connections aren't reliable on data types, so it's safer anyway -- even if tendril was wrong.

Change 728655 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@master] updateMenteeData: Collect more profiling data

https://gerrit.wikimedia.org/r/728655

Change 728656 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/puppet@production] updatementeedata.pp: Update script parameters

https://gerrit.wikimedia.org/r/728656

Change 728657 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@master] updateMenteeData: Remove unused parameter

https://gerrit.wikimedia.org/r/728657

Change 728657 abandoned by Urbanecm:

[mediawiki/extensions/GrowthExperiments@master] updateMenteeData: Remove unused parameter

Reason:

no longer needed

https://gerrit.wikimedia.org/r/728657

Livetested the profiling patch at mwmaint1002:

[urbanecm@mwmaint1002 ~]$ mwscript ../../../../../home/urbanecm/updateMenteeData.php --wiki=cswiki --dbshard=s2 --mentor='Martin Urbanec' --verbose
Profiling data:
  * filtermentees: 1 seconds
  * edittimestampinternal: 0 seconds
  * usernames: 4 seconds
  * reverted: 5 seconds
  * questions: 4 seconds
  * editcount: 3 seconds
  * registration: 3 seconds
  * blocks: 5 seconds
===============
Done. Took 27 seconds.
[urbanecm@mwmaint1002 ~]$

Works as expected.

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/728259 is hopefully the last patch this needs (apart from the profiling one). Tested it in production:

[urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki --mentor='Martin Urbanec'
Done. Took 23 seconds.
[urbanecm@mwmaint1002 ~]$ sudo -u mwdeploy cp UncachedMenteeOverviewDataProvider.php /srv/mediawiki/php-1.38.0-wmf.3/extensions/GrowthExperiments/includes/MentorDashboard/MenteeOverview/UncachedMenteeOverviewDataProvider.php
[urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki --mentor='Martin Urbanec'
Done. Took 1 seconds.
[urbanecm@mwmaint1002 ~]$ grep 'Oct 10' /var/log/mediawiki/mediawiki_job_growthexperiments-updateMenteeData-s2/syslog.log | grep cswiki
Oct 10 04:15:01 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s2[2777]: cswiki
Oct 10 04:21:08 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s2[2777]: cswiki:  Done. Took 367 seconds.
[urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki
Done. Took 21 seconds.
[urbanecm@mwmaint1002 ~]$

In another words, that patch takes processing time for cswiki from several minutes to half a minute. Moving to CR – all code should be ready to review now.

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/728259 is hopefully the last patch this needs (apart from the profiling one). Tested it in production:

[urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki --mentor='Martin Urbanec'
Done. Took 23 seconds.
[urbanecm@mwmaint1002 ~]$ sudo -u mwdeploy cp UncachedMenteeOverviewDataProvider.php /srv/mediawiki/php-1.38.0-wmf.3/extensions/GrowthExperiments/includes/MentorDashboard/MenteeOverview/UncachedMenteeOverviewDataProvider.php
[urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki --mentor='Martin Urbanec'
Done. Took 1 seconds.
[urbanecm@mwmaint1002 ~]$ grep 'Oct 10' /var/log/mediawiki/mediawiki_job_growthexperiments-updateMenteeData-s2/syslog.log | grep cswiki
Oct 10 04:15:01 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s2[2777]: cswiki
Oct 10 04:21:08 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s2[2777]: cswiki:  Done. Took 367 seconds.
[urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki
Done. Took 21 seconds.
[urbanecm@mwmaint1002 ~]$

In another words, that patch takes processing time for cswiki from several minutes to half a minute. Moving to CR – all code should be ready to review now.

Nice work!

Change 728655 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] updateMenteeData: Collect more profiling data

https://gerrit.wikimedia.org/r/728655

Change 728259 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] UncachedMenteeOverviewDataProvider::getFilteredMenteesForMentor: Cast IDs to ints

https://gerrit.wikimedia.org/r/728259

Change 729914 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.3] UncachedMenteeOverviewDataProvider::getFilteredMenteesForMentor: Cast IDs to ints

https://gerrit.wikimedia.org/r/729914

Change 729915 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.3] updateMenteeData: Collect more profiling data

https://gerrit.wikimedia.org/r/729915

Change 729914 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.3] UncachedMenteeOverviewDataProvider::getFilteredMenteesForMentor: Cast IDs to ints

https://gerrit.wikimedia.org/r/729914

Change 729915 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.3] updateMenteeData: Collect more profiling data

https://gerrit.wikimedia.org/r/729915

Mentioned in SAL (#wikimedia-operations) [2021-10-12T07:58:08Z] <urbanecm@deploy1002> Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: 17dc3aa, e0ca905, c0f4f4e: GrowthExperiments backports (T292224, T290609, T290609) (duration: 00m 59s)

Change 730158 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@master] UncachedMenteeOverviewDataProvider: Reset state before calculating

https://gerrit.wikimedia.org/r/730158

Change 730161 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@master] updateMenteeData: Summarize profiling data for all mentors

https://gerrit.wikimedia.org/r/730161

Moving back to in progress. Data returned by profiling are not plausible.

Change 730191 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@master] updateMenteeData: Switch profiling to microsecond precision

https://gerrit.wikimedia.org/r/730191

Change 730194 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@master] [DNM] updateMenteeData: Measure time spent in provider/updating cache

https://gerrit.wikimedia.org/r/730194

I'm confused. I applied the https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/730194/ dependency chain to mwmaint1002, and this is what I see:

[urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki --dbshard=s2 --verbose
Profiling data:
  * filtermentees: 0 seconds
  * edittimestampinternal: 0 seconds
  * usernames: 0 seconds
  * reverted: 0 seconds
  * questions: 0 seconds
  * editcount: 0 seconds
  * registration: 0 seconds
  * blocks: 0 seconds
===============
Done. Took 13 seconds (2.3821144104004 us spent by updating cache, 8.8777799606323 us spent in provider).
[urbanecm@mwmaint1002 ~]$

13 seconds is of course more than acceptable, but I don't see what takes most of that time – the profiling I did found only 10 microseconds (!) out of the 13 seconds it took.

Change 730194 abandoned by Urbanecm:

[mediawiki/extensions/GrowthExperiments@master] [DNM] updateMenteeData: Measure time spent in provider/updating cache

Reason:

https://gerrit.wikimedia.org/r/730194

Change 730158 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] UncachedMenteeOverviewDataProvider: Reset state before calculating

https://gerrit.wikimedia.org/r/730158

Change 730161 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] updateMenteeData: Summarize profiling data for all mentors

https://gerrit.wikimedia.org/r/730161

Change 730191 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] updateMenteeData: Switch profiling to microsecond precision

https://gerrit.wikimedia.org/r/730191

Change 730583 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.4] UncachedMenteeOverviewDataProvider: Reset state before calculating

https://gerrit.wikimedia.org/r/730583

Change 730584 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.3] UncachedMenteeOverviewDataProvider: Reset state before calculating

https://gerrit.wikimedia.org/r/730584

Change 730585 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.3] updateMenteeData: Summarize profiling data for all mentors

https://gerrit.wikimedia.org/r/730585

Change 730726 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.4] updateMenteeData: Summarize profiling data for all mentors

https://gerrit.wikimedia.org/r/730726

Change 730720 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.3] updateMenteeData: Switch profiling to microsecond precision

https://gerrit.wikimedia.org/r/730720

Change 730583 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.4] UncachedMenteeOverviewDataProvider: Reset state before calculating

https://gerrit.wikimedia.org/r/730583

Change 730726 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.4] updateMenteeData: Summarize profiling data for all mentors

https://gerrit.wikimedia.org/r/730726

Change 730584 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.3] UncachedMenteeOverviewDataProvider: Reset state before calculating

https://gerrit.wikimedia.org/r/730584

Change 730585 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.3] updateMenteeData: Summarize profiling data for all mentors

https://gerrit.wikimedia.org/r/730585

Change 730720 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.3] updateMenteeData: Switch profiling to microsecond precision

https://gerrit.wikimedia.org/r/730720

Mentioned in SAL (#wikimedia-operations) [2021-10-14T10:33:24Z] <urbanecm@deploy1002> Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: 465b564, a8cc98b, 6e95c48: GrowthExperiments backports (T290609) (duration: 01m 06s)

Mentioned in SAL (#wikimedia-operations) [2021-10-14T10:35:16Z] <urbanecm@deploy1002> Synchronized php-1.38.0-wmf.4/extensions/GrowthExperiments/: 1f33fc3, e0ea1b8, cba2ac9: GrowthExperiments backports (T290609) (duration: 01m 05s)

Change 728656 merged by Jcrespo:

[operations/puppet@production] updatementeedata.pp: Update script parameters

https://gerrit.wikimedia.org/r/728656

Change 725286 merged by Jbond:

[operations/puppet@production] growthexperiments: Remove absented systemd job

https://gerrit.wikimedia.org/r/725286

[urbanecm@mwmaint1002 ~/updatementeedata-measure]$ cat updatementeedata-all-but-enwiki.sh
#!/bin/bash

for i in {2..8}; do
        foreachwikiindblist "growthexperiments & s$i" extensions/GrowthExperiments/maintenance/updateMenteeData.php --force --verbose > updatementeedata-$i.log &
done
wait
[urbanecm@mwmaint1002 ~/updatementeedata-measure]$ cat updatementeedata-all.sh
#!/bin/bash

for i in {1..8}; do
        foreachwikiindblist "growthexperiments & s$i" extensions/GrowthExperiments/maintenance/updateMenteeData.php --force --verbose > updatementeedata-$i.log &
done
wait
[urbanecm@mwmaint1002 ~/updatementeedata-measure]$ time bash updatementeedata-all-but-enwiki.sh

real    4m37.078s
user    3m26.952s
sys     0m27.330s
[urbanecm@mwmaint1002 ~/updatementeedata-measure]$ time bash updatementeedata-all.sh

real    6m48.517s
user    4m35.842s
sys     0m34.360s
[urbanecm@mwmaint1002 ~/updatementeedata-measure]$

I call this resolved -- less than 7 minutes for all Growth wikis, that's more than enough.