MassMessage failed delivery claiming "readonly" although the page is not protected
Open, Needs TriagePublic

Description

MassMessage regularly fails with a readonly error code on some of the target wikis, the for no obvious reason. For large target sets there can be hundreds of failures.

The failure is only logged on the local wiki and not exposed to the sender, making these errors hard to realize and cumbersome to identify (T139380#4460313 has a workaround).

Danny_B created this task.Jul 5 2016, 4:21 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJul 5 2016, 4:21 PM

readonly means the database is locked, not page protection (something like "protectedpage" would be the error). MassMessage should probably backoff and then retry like we do for edit conflicts.

Tgr added a subscriber: Tgr.Jul 23 2018, 8:00 PM

Made extra annoying by the inability of seeing whether message delivery was successful (short of looking through massmessage logs on hundreds of wikis).

Tgr added a comment.EditedJul 29 2018, 8:18 PM

Here's a script to at least query which wikis had errors, given the UTC date and edit summary of the delivery (needs jq and GNU Parallel to be installed):

export LOGS_DAY='2018-07-12'; export LOGS_SUBJECT="Consultation on the creation of a separate user group for editing sitewide CSS/JS"; curl -s 'https://meta.wikimedia.org/w/api.php?action=sitematrix&format=json&smtype=special%7Clanguage&smstate=all&smlangprop=site&smsiteprop=url&smlimit=max' | jq --raw-output '(.sitematrix[]["site"]?[] | select(has("closed") or has ("private") or has("fishbowl") or has("nonglobal") | not).url), (.sitematrix.specials[].url)' | parallel --no-notice -P5 -I @ "curl -s '@/w/api.php?action=query&format=json&list=logevents&leprop=ids%7Ctitle%7Cdetails%7Ctimestamp|type&letype=massmessage&lestart=${LOGS_DAY}T00%3A00%3A00.000Z&leend=${LOGS_DAY}T23%3A59%3A59.000Z&ledir=newer' | jq --raw-output '.query.logevents[]? | select(.params.subject == \"${LOGS_SUBJECT}\") | \"@/w/index.php?title=Special:Log&type=massmessage&offset=\(.timestamp | fromdateiso8601 | . - 1 | strftime(\"%Y%m%d%H%M%S\") )&dir=prev&limit=1 \(.logid) \(.params.reason // .action // \"\")\"'"

I got a bunch of readonly errors (~200, apparently) around 2018-07-12 08:45, and there is nothing relevant in the SAL, so I'm not convinced these errors are not anomalous.

Tgr added a comment.Jul 30 2018, 2:01 PM

Nearly 300 errors this time, again nothing in SAL. These were my only two attempts to send mass messages, so I'm pretty sure something is broken there.

KaMan added a subscriber: KaMan.Oct 5 2018, 3:37 AM

Crossposting a theory here:

I wonder if there's something in the code that CommRel people are using?

The only other message that failed there was again from the team,
12:03, 4 October 2018 Delivery of "Reminder: No editing for up to an hour on 10 October" to Wikibooks:Reading room/General failed with an error code of readonly .

The page is getting other MMs: https://en.wikibooks.org/w/index.php?title=Wikibooks:Reading_room/General&action=history .

(This is also true for https://en.wikiversity.org/w/index.php?title=Special:Log&page=Wikiversity%3AColloquium , FWIW.)

Tgr updated the task description. (Show Details)Wed, Jan 16, 8:13 PM