Page MenuHomePhabricator

MassMessage doesn't work on dty.wikipedia
Closed, ResolvedPublicPRODUCTION ERROR

Description

As reported on this thread, MassMessage doesn't seem to be working on dty.wikipedia.

Suspicion is that it is due to the fact that this imported page already had an edit of <code>MediaWiki message delivery</code> causing conflict. In fact if we try to browse the local contributions of that account, while the imported edit appears, there's a message at the top saying that there's no account registered with that name.

Testing has involved the use of the global and local message delivery system:

  • Global: sending a message from Meta-Wiki to my talk page to dtywiki resulted in correct preview and sending. No error messages but the message never arrived. I guess some sort of log entry indicating that an error has occured should've been printed in the logs.
  • Local: trying to use MassMessage locally at dtywiki fails in the preview phase with this exception: [Wo72rApAMFQAAHYJKwsAAABG] 2018-02-22 16:58:20: Excepción grave de tipo "UnexpectedValueException".

My guess is that's happening due to that conflicting edit there.

Solutions I see:

  • give me green light and I'll locally rename MediaWiki message delivery at dty.wikipedia to something else, leaving that account free and ready for the system to use it, or;
  • do something else server/DB side.

Possible other issues:
An investigation should happen to see if this problem exists elsewhere as well. A maintenance script should then be coded to fix those in order for the service be effective. Also, errors should be printed in the mass message log for us to know, instead of silently failing.

Thank you.

Event Timeline

Due to Wikimedia production impact.

Asked to look at the trace by @MarcoAurelio

Error message:

[Wo72rApAMFQAAHYJKwsAAABG] /wiki/%E0%A4%AC%E0%A4%BF%E0%A4%B6%E0%A5%87%E0%A4%B7:MassMessage   UnexpectedValueException from line 170 of /srv/mediawiki/php-1.31.0-wmf.21/includes/user/UserGroupMembership.php: UserGroupMembership::insert() needs a positive user ID. Did you forget to add your User object to the database before calling addGroup()?

Trace:

#0 /srv/mediawiki/php-1.31.0-wmf.21/includes/user/User.php(3463): UserGroupMembership->insert(boolean)
#1 /srv/mediawiki/php-1.31.0-wmf.21/extensions/MassMessage/includes/MassMessage.php(44): User->addGroup(string)
#2 /srv/mediawiki/php-1.31.0-wmf.21/extensions/MassMessage/includes/SpecialMassMessage.php(307): MediaWiki\MassMessage\MassMessage::getMessengerUser()
#3 /srv/mediawiki/php-1.31.0-wmf.21/extensions/MassMessage/includes/SpecialMassMessage.php(206): MediaWiki\MassMessage\SpecialMassMessage->preview(array)
#4 /srv/mediawiki/php-1.31.0-wmf.21/includes/htmlform/HTMLForm.php(671): MediaWiki\MassMessage\SpecialMassMessage->callback(array, HTMLForm)
#5 /srv/mediawiki/php-1.31.0-wmf.21/includes/htmlform/HTMLForm.php(563): HTMLForm->trySubmit()
#6 /srv/mediawiki/php-1.31.0-wmf.21/extensions/MassMessage/includes/SpecialMassMessage.php(89): HTMLForm->tryAuthorizedSubmit()
#7 /srv/mediawiki/php-1.31.0-wmf.21/includes/specialpage/SpecialPage.php(522): MediaWiki\MassMessage\SpecialMassMessage->execute(NULL)
#8 /srv/mediawiki/php-1.31.0-wmf.21/includes/specialpage/SpecialPageFactory.php(579): SpecialPage->run(NULL)
#9 /srv/mediawiki/php-1.31.0-wmf.21/includes/MediaWiki.php(288): SpecialPageFactory::executePath(Title, RequestContext)
#10 /srv/mediawiki/php-1.31.0-wmf.21/includes/MediaWiki.php(861): MediaWiki->performRequest()
#11 /srv/mediawiki/php-1.31.0-wmf.21/includes/MediaWiki.php(524): MediaWiki->main()
#12 /srv/mediawiki/php-1.31.0-wmf.21/index.php(42): MediaWiki->run()
#13 /srv/mediawiki/w/index.php(3): include(string)
#14 {main}

Yeah, user conflict there as expected. Can I try the local usurp here or want to fix this by other means? Ping @Legoktm

I think the script above should have resolved this issue. If we fetch some data from the diff made by MediaWiki message delivery, the result is:

MariaDB [dtywiki_p]> select rev_id,rev_user,rev_user_text from revision where rev_id = 73185;
+--------+----------+----------------------------+
| rev_id | rev_user | rev_user_text              |
+--------+----------+----------------------------+
|  73185 |        0 | MediaWiki message delivery |
+--------+----------+----------------------------+
1 row in set (0.00 sec)

@Anomie Maybe you could run again the script there at dtywiki and see if it fixes this? The rev_id edit of above is imported from maiwiki and the rev_user = 0, so the script should've caught this afaics.

@Anomie Maybe you could run again the script there at dtywiki and see if it fixes this? The rev_id edit of above is imported from maiwiki and the rev_user = 0, so the script should've caught this afaics.

I don't think it will, but I'll give it a try anyway.

Trimmed output
anomie@terbium:~$ mwscript cleanupUsersWithNoId.php --wiki=dtywiki --force -p imported --assign
Beginning cleanup of revision
Completed cleanup, assigned 0 and prefixed 0 row(s)
Beginning cleanup of archive
Completed cleanup, assigned 0 and prefixed 0 row(s)
Beginning cleanup of logging
Completed cleanup, assigned 0 and prefixed 0 row(s)
Beginning cleanup of image
Completed cleanup, assigned 0 and prefixed 0 row(s)
Beginning cleanup of oldimage
Completed cleanup, assigned 0 and prefixed 0 row(s)
Beginning cleanup of filearchive
Completed cleanup, assigned 0 and prefixed 0 row(s)
Beginning cleanup of ipblocks
Completed cleanup, assigned 0 and prefixed 0 row(s)
Beginning cleanup of recentchanges
Completed cleanup, assigned 158 and prefixed 0 row(s)

That "assigned 158" for recentchanges is a surprise to me, but is not related to this task (see T183019#3996390). As you can see, it didn't make any changes to the revision table.

The real problem is that UnexpectedValueException. I think I know why it's occurring, I'll have to look into it a little more and write a patch.

Change 413767 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Have User::createNew() load the object from master

https://gerrit.wikimedia.org/r/413767

Change 413768 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Force READ_LATEST for User::newFromId() if writes had been done

https://gerrit.wikimedia.org/r/413768

The real problem is that UnexpectedValueException. I think I know why it's occurring, I'll have to look into it a little more and write a patch.

What seems to be going on is replica lag, or possibly a transaction somehow getting opened hiding the insert from the replica. The order of events would be something along the lines of

  • User::newSystemUser() finds the user doesn't exist in the DB, and calls User::createNew() to create it (code).
  • User::createNew() adds the user to the database, and uses User::newFromId() with the just-inserted ID to create the User object to return (code).
  • When some attempt to access any other User property triggers a call to User::load(), it'll use READ_NORMAL. So load() calls loadFromId() which calls loadFromCache(), which presumably finds a cache miss and calls loadFromDatabase(), which doesn't fall back to master if the row isn't found in the replica, so it instead loads the "default" anonymous user (ID = 0).

I see two potential fixes here.

  • User::createNew() should return a fully-loaded User object. Use User::newFromRow( (object)$fields ), or reselect the row and use User::newFromRow(), or at least call ->load( self::READ_LATEST ) to force it to load from master.
  • User::loadFromDatabase() could fall back to master if the row isn't found and wfGetLB()->hasOrMadeRecentMasterChanges() is true. Or have User::load() force master for the 'id' case like it already does for 'name'.

May as well do both: https://gerrit.wikimedia.org/r/413767 does the first, and https://gerrit.wikimedia.org/r/413768 does the second.

Change 413767 merged by jenkins-bot:
[mediawiki/core@master] Have User::createNew() load the object from master

https://gerrit.wikimedia.org/r/413767

Change 413768 merged by jenkins-bot:
[mediawiki/core@master] Force READ_LATEST for User::newFromId() if writes had been done

https://gerrit.wikimedia.org/r/413768

Thanks for the fixes @Anomie - I'll wait until they're deployed everywhere in the train before closing this task, unless you think they're cherry-pick-able to the current wmf branch. Regards.

It seems we're having problems with the MediaWiki train as group1 wikis had to be rolled back to wmf.22. I'll keep this open but I'm tempted to cherry-pick to wmf.22.

I saw, the status "Merged" and tried sending the message. It worked but actually didn't (rest 8 users didn't received). There is still problem alive. :(

@Anomie Since the patches are now live on all production wikis (wmf.23 everywhere) do we need to run the script here again? Thanks.

No, the script didn't do anything here in the first place.

I see 9 users in the list and 9 messages at https://dty.wikipedia.org/wiki/Special:Contributions/MediaWiki_message_delivery - what is exactly the problem?

Oh, Sorry ! No problem. Thanks a bunch !! @MarcoAurelio & @Anomie :)

MarcoAurelio assigned this task to Anomie.
MarcoAurelio removed a project: Patch-For-Review.

You're welcome.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:09 PM