Page MenuHomePhabricator

MassMessage failed delivery claiming "readonly" although the page is not protected
Open, HighPublic

Description

MassMessage regularly fails with a readonly error code on some of the target wikis, for no obvious reason. For large target sets there can be hundreds of failures.

The failure is only logged on the local wiki and not exposed to the sender, making these errors hard to realize and cumbersome to identify (T139380#4460313 has a workaround for the latter).


(Merged duplicate description:)
The last few days, we've had reports of MassMessage not delivering messages to everyone on the target list.

For T213864, two hours after a message was sent out (14:47 UTC, 16 January 2019) to https://meta.wikimedia.org/w/index.php?oldid=18788945 a manual check shows that some of the targets have received the message, but multiple wikis have not. The queue is said to be empty.

At https://meta.wikimedia.org/wiki/Talk:Tech/News/2019/03 @IKhitron has reported messages not being delivered (seen on he.wp, where 1 user out of 11 received the issue).


See comments below for many examples and logs


Possibly related (?) T214712: MassMessage delivery of Tech News failing in wikitext on Hebrew Wikipedia
Another duplicate (?) T180378: simplewiki has a lot of MassMessage failures due to "readonly"

Event Timeline

Danny_B created this task.Jul 5 2016, 4:21 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJul 5 2016, 4:21 PM

readonly means the database is locked, not page protection (something like "protectedpage" would be the error). MassMessage should probably backoff and then retry like we do for edit conflicts.

Tgr added a subscriber: Tgr.Jul 23 2018, 8:00 PM

Made extra annoying by the inability of seeing whether message delivery was successful (short of looking through massmessage logs on hundreds of wikis).

Tgr added a comment.EditedJul 29 2018, 8:18 PM

Here's a script to at least query which wikis had errors, given the UTC date and edit summary of the delivery (needs jq and GNU Parallel to be installed):

export LOGS_DAY='2018-07-12'; export LOGS_SUBJECT="Consultation on the creation of a separate user group for editing sitewide CSS/JS"; curl -s 'https://meta.wikimedia.org/w/api.php?action=sitematrix&format=json&smtype=special%7Clanguage&smstate=all&smlangprop=site&smsiteprop=url&smlimit=max' | jq --raw-output '(.sitematrix[]["site"]?[] | select(has("closed") or has ("private") or has("fishbowl") or has("nonglobal") | not).url), (.sitematrix.specials[].url)' | parallel --no-notice -P5 -I @ "curl -s '@/w/api.php?action=query&format=json&list=logevents&leprop=ids%7Ctitle%7Cdetails%7Ctimestamp|type&letype=massmessage&lestart=${LOGS_DAY}T00%3A00%3A00.000Z&leend=${LOGS_DAY}T23%3A59%3A59.000Z&ledir=newer' | jq --raw-output '.query.logevents[]? | select(.params.subject == \"${LOGS_SUBJECT}\") | \"@/w/index.php?title=Special:Log&type=massmessage&offset=\(.timestamp | fromdateiso8601 | . - 1 | strftime(\"%Y%m%d%H%M%S\") )&dir=prev&limit=1 \(.logid) \(.params.reason // .action // \"\")\"'"

I got a bunch of readonly errors (~200, apparently) around 2018-07-12 08:45, and there is nothing relevant in the SAL, so I'm not convinced these errors are not anomalous.

Tgr added a comment.Jul 30 2018, 2:01 PM

Nearly 300 errors this time, again nothing in SAL. These were my only two attempts to send mass messages, so I'm pretty sure something is broken there.

KaMan added a subscriber: KaMan.Oct 5 2018, 3:37 AM

Crossposting a theory here:

I wonder if there's something in the code that CommRel people are using?

The only other message that failed there was again from the team,
12:03, 4 October 2018 Delivery of "Reminder: No editing for up to an hour on 10 October" to Wikibooks:Reading room/General failed with an error code of readonly .

The page is getting other MMs: https://en.wikibooks.org/w/index.php?title=Wikibooks:Reading_room/General&action=history .

(This is also true for https://en.wikiversity.org/w/index.php?title=Special:Log&page=Wikiversity%3AColloquium , FWIW.)

Tgr updated the task description. (Show Details)Jan 16 2019, 8:13 PM
Quiddity triaged this task as High priority.Sun, Feb 17, 7:54 PM

When this bug occurs it is severely impacting the functionality of this tool and causing a lot of extra work for MassMessage senders. Triaging to 'high'.

Quiddity updated the task description. (Show Details)Sun, Feb 17, 8:05 PM