Page MenuHomePhabricator

Homepage mentor is not stored persistently at Romanian Wikipedia
Closed, ResolvedPublic

Description

Bug description

Mentor of users at the Romanian Wikipedia is apparently not stored permanently in some or all cases.

Screencast

I managed to reproduce this via my WMF account. The mentor sometimes changed, and sometimes did not change. Unclear whether there are any steps to reproduce.

Impact

Unstability of the mentorship feature at ro.wikipedia (deployed to experienced users only now). Unclear whether it happens at any other wiki as well.

User report

See my ro.wikipedia talk page.

Event Timeline

Urbanecm_WMF renamed this task from Homepage mentor is not persistent at Romanian Wikipedia to Homepage mentor is not stored persistently at Romanian Wikipedia.Feb 22 2021, 8:10 PM

Checked rowiki - the issue is confirmed. When Special:Homepage reloads, the Mentor gets reassigned.

This ticket is referenced from a ticket about problems with job execution. Which job is responsible for this feature? I see GrowthExperiments have 2 jobs, but I don't know the codebase enough to understand which of the 2 jobs is responsible for this feature.

This ticket is referenced from a ticket about problems with job execution. Which job is responsible for this feature? I see GrowthExperiments have 2 jobs, but I don't know the codebase enough to understand which of the 2 jobs is responsible for this feature.

This feature is backed up by userOptionsUpdate (which is a core job, AFAIK).

Ok, looking in the logs, I see steady stream of userOptionsUpdate successful executions. I can even find some for rowiki.

See mwlog1001:/srv/mw-log/JobExecutor.log | grep userOptionsUpdate

Change 666211 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/GrowthExperiments@master] Ensure consistent access to on-the-fly-generated Mentor

https://gerrit.wikimedia.org/r/666211

Ok, looking in the logs, I see steady stream of userOptionsUpdate successful executions. I can even find some for rowiki.

See mwlog1001:/srv/mw-log/JobExecutor.log | grep userOptionsUpdate

Hmm, I don't see userOptionsUpdate jobs for rowiki:

cat /srv/mw-log/JobExecutor.log | grep userOptionsUpdate | grep rowiki

The log is very very large, maybe you didn't wait for it to get to it? I don't see many by there were some. For example:

2021-02-22 21:14:48 [YDQIz4YmPMjJX4De42kloAAAAFA] mw1336 rowiki 1.36.0-wmf.31 JobExecutor INFO: Finished job execution {"job":"userOptionsUpdate Special: userId=484812 options={\"growthexperiments-mentor-id\":467397} requestId=YDQIz4YmPMjJX4De42kloAAAAFA","job_type":"userOptionsUpdate","job_status":true,"job_duration":0.24150991439819336}

JobQueue is not split by wiki, all wikis are processed in the same queue, so it would be very strange if rowiki had some special treatment.

The log is very very large, maybe you didn't wait for it to get to it? I don't see many by there were some. For example:

2021-02-22 21:14:48 [YDQIz4YmPMjJX4De42kloAAAAFA] mw1336 rowiki 1.36.0-wmf.31 JobExecutor INFO: Finished job execution {"job":"userOptionsUpdate Special: userId=484812 options={\"growthexperiments-mentor-id\":467397} requestId=YDQIz4YmPMjJX4De42kloAAAAFA","job_type":"userOptionsUpdate","job_status":true,"job_duration":0.24150991439819336}

JobQueue is not split by wiki, all wikis are processed in the same queue, so it would be very strange if rowiki had some special treatment.

How long does it take for an event to show up? I had tail -f /srv/mw-log/JobExecutor.log | grep userOptionsUpdate | grep rowiki and refreshed Special:Homepage on rowiki a few times, which should enqueue some userOptionsUpdate jobs, but after a minute nothing appeared.

I see the same issue on Bengali Wikipedia. If I look at the functions called while serving Special:Homepage (which is when the mentor is assigned if one isn't already created), I can see that the job is enqueued https://performance.wikimedia.org/xhgui/run/symbol?id=60342bd57990ecfa3d2e239d&symbol=GrowthExperiments%5CMentorship%5CMentorPageMentorManager%3A%3AsetMentorForUser

Anyway, maybe it's an issue with the userSaveOptions job not getting enqueued for some reason? Because the code to lazy push the job is definitely getting called. If anyone wants to reproduce, you can go to https://ro.wikipedia.org/wiki/Special:Preferin%C8%9Be?uselang=en, enable "Display newcomer homepage", and then visit https://ro.wikipedia.org/wiki/Special:Homepage, as you refresh you'll see a different mentor each time.

Hm.. interesting. In order to see the jobs actually being scheduled, you need to go to mwmaint1002 and do

kafkacat -b kafka-main1001.eqiad.wmnet -t "eqiad.mediawiki.job.userOptionsUpdate" -o -1000 | grep rowiki
  • that's as real time as it possibly can be.

Doing this, I can see the jobs you just submitted:

{"$schema":"/mediawiki/job/1.0.0","meta":{"uri":"https://placeholder.invalid/wiki/Special:Badtitle","request_id":"YDQrj9N4i4WDGaQ59cNDrgAAAIk","id":"a5a64f5f-15f7-4180-85c8-06c43500a042","dt":"2021-02-22T22:09:20Z","domain":"ro.wikipedia.org","stream":"mediawiki.job.userOptionsUpdate"},"database":"rowiki","type":"userOptionsUpdate","sha1":"51af42d64cacebb5909760c4338ebffd856dd2d6","params":{"userId":548006,"options":{"growthexperiments-mentor-id":438919},"requestId":"YDQrj9N4i4WDGaQ59cNDrgAAAIk"},"mediawiki_signature":"5c37bd7551de65fb6bcb09b14cbebc35cd4da0a9"}

but indeed I can not find reference to this job being executed in executor log.

I just ran the command @Pchelolo suggested, and I'm indeed seeing the expected jobs there. However, the job is apparently not getting executed for quite a long time, judging by the following query:

1mysql:research@dbstore1003.eqiad.wmnet [rowiki]> select count(*) from user_properties where up_property="growthexperiments-mentor-id";
2+----------+
3| count(*) |
4+----------+
5| 1 |
6+----------+
71 row in set (0.001 sec)
8
9mysql:research@dbstore1003.eqiad.wmnet [rowiki]>

There is only one user (see P14446 [note this is a private paste with confidential data; let me know if you need access]) with this user property, despite it is supposed to be set by the job. Various engineers from the Growth team and multiple communities tried out the homepage when this issue was detected, the fact the count is still zero is somehow concerning.

So, looking at a different graph, we can see that there's indeed some backlog of userOptionsChange jobs. Lemme restart change-prop, just to see.

https://ro.wikipedia.org/wiki/Special:Jurnal?type=growthexperiments&user=NGC+54&page=&wpdate=&tagfilter=
But my mentor is not NGC 54 (even if I am NGC 54). And my mentor was changed again (even if it I thought that it is a temporary issue: https://ro.wikipedia.org/w/index.php?title=Discu%C8%9Bie_Utilizator:Martin_Urbanec_(WMF)&diff=prev&oldid=13947645&diffmode=source). I have done something wrong at the list of mentors? However, after some hours, NGC 54 (me) is displayed as my mentor.

Change 666317 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[mediawiki/extensions/GrowthExperiments@master] MentorManager: Do not use job queue for saving mentor when in POST request

https://gerrit.wikimedia.org/r/666317

I tested with another account (NGC Galaxie 56) and the mentor function for NGC Galaxie 56 was consistent (I reloaded the Homepage, and the mentor was the same; so no issue).

It looks like the restart of change-prop (T275429#6850830) fixed this. There will be some more investigation on the root cause of the job queue issues, per T275437#6850975

@Urbanecm_WMF I think we could mark this as resolved, tentatively?

1mysql:research@dbstore1003.eqiad.wmnet [rowiki]> select count(*) from user_properties where up_property="growthexperiments-mentor-id";
2+----------+
3| count(*) |
4+----------+
5| 10 |
6+----------+
71 row in set (0.001 sec)
8
9mysql:research@dbstore1003.eqiad.wmnet [rowiki]>

Agreed, this is much better :). Confirmed via calling new mw.Api().saveOption('growthexperiments-mentor-id', null).done( function() { window.location.reload() }); with homepage loaded that the mentor is stored permanently now.

@kostajh No objection to closing, but we should either untag https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/666211 and track it elsewhere or abandon it.

@kostajh No objection to closing, but we should either untag https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/666211 and track it elsewhere or abandon it.

Untagged the patch. I'll leave for @Etonkovidova to resolve the task.

Etonkovidova claimed this task.

Checked on rowiki wmf.31 and testwiki wmf.32 - all seems to be back to normal. A mentor once assigned is not changed with reloading the page or any user action (except for "Restore all default settings" in Prefeences).

(clearing assignee as no one actually fixed this one)