Page MenuHomePhabricator

Use of Special:NotifyTranslators is not being logged
Closed, DuplicatePublicPRODUCTION ERROR

Description

Translation Notification Bot at Meta-Wiki is sending notifications but it's not adding log entries. Last notifications were sent on 10 March but no log entry for them has been added. Not sure when this stopped working but last notification that was logged was on March 5; and after that no other one.

Please fix and add the missing log entries to the log.

Thank you.

Event Timeline

MarcoAurelio triaged this task as Unbreak Now! priority.Mar 12 2017, 12:47 PM
MarcoAurelio added subscribers: Vogone, Samuele2002.

Here are screenshots from me trying to notify translators for https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Organize/Discussion_Coordinator_Role page: http://imgur.com/a/VClqP The third try with that db error was effectively successful (at least wiki wise, I have no means to check whether users receiving those invitations over email got notified.)
(I planned to file this task but did not get to it)

Could you please post the exact error messages here so it could be investigated in Wikimedia-Logstash?

Advice: once you try to notify translators, if there are many to notify the page may show you a timeout error. Do not attempt to re-send the notification as they're already queued, and reloading the page will just make the translators receive the notification more than once, which is annoying.

Also, the bot is blocked at Meta. I don't think this should be used if its use is not being logged as it used to be.

In that case I checked user talk pages of users I know that they are receiving the invitations in order to check if the invitation went through.
I have only screenshots now, so the exact error ids can only be retyped from there, as for the messages in Ukrainian those probably are
MediaWiki:Internalerror/uk
MediaWiki:Internalerror-fatal-exception/uk
for the former two screenshots and
MediaWiki:Databaseerror/uk
MediaWiki:Transaction-duration-limit-exceeded/uk
for the second

Note: I just used Special:SearchTranslations on TWN now rather than &uselang=qqx the moment I got those, so I cannot be 100% sure those are exactly those messages or some other with the same text, but most probably those are the ones.

As for the usage, @MarcoAurelio I think we can set up some manual logging I guess we only need to notify all translate admins about it somehow. Preferably by putting some notice on the respective special page. What do you think?

@Base: problem is that the screenshots are rather small that I can't read the exception codes. If you could have them posted here that'll help the developer who wishes to investigate this. Thanks.

They are not small, it is just that imgur made it a bit tricky to find links to full size version lately :)
http://i.imgur.com/kCROyEd.png
http://i.imgur.com/HTO8GrQ.png
http://i.imgur.com/ToMhiQF.png
hope they are readable enough this way

This is a severe issue, but not especially urgent. Only the block is causing issues.

Nemo_bis lowered the priority of this task from Unbreak Now! to High.Mar 19 2017, 11:47 PM

Some work should be done to fix this issue. We can live with the bot sending duplicate notifications (which can be avoided if we instruct translator administrators not to reload the page over and over in case of a timeout; but we need to restore previous logs and make the extension log its use again to prevent abusive use of the feature. We won't have any translations if we cannot notify translators about new work. Thanks.

I just tested this and it worked for me: https://meta.wikimedia.org/wiki/Special:Log/notifytranslators . I deliberately chose a language that had nobody signed up to it (Extremaduran), then had @Quiddity sign up so my test would go to him and only him. Puzzlingly, the log entry says that it went to 2 recipients instead of 1 (maybe because he signed up for both talk page and email notifications?). In any case, the logging appears to be working.

I wanted to look at the errors behind the error codes that were helpfully provided earlier, to figure out what the issue was/is and if I maybe missed it by accident, but unfortunately those codes are now too old to be useful :/ . Our error log storage complies with the privacy policy, which I suppose is good, but it means that error information from March is gone now.

@Catrope Thanks for testing, but IMHO the test demonstrates that its broken. Being User:Translation Notification Bot blocked it cannot deliver any talk page message at Meta, so the log should display at least 1 failure. So while the system might be logging again, I feel that we cannot trust in them, which is as problematic as having no logs at all IMHO. Thanks.

@Catrope Thanks for testing, but IMHO the test demonstrates that its broken. Being User:Translation Notification Bot blocked it cannot deliver any talk page message at Meta, so the log should display at least 1 failure. So while the system might be logging again, I feel that we cannot trust in them, which is as problematic as having no logs at all IMHO. Thanks.

That is a separate issue. The failure is hardcoded to zero, and with current implementation it is not possible to relay any failure information (except as separate log messages), because the log message is inserted right after the delivery jobs have been pushed to the job queue (which is when you click the send button, and it must return a response to the user as quickly as possible).

I've just tested again the translation notification extension with the following result:

  • Sent a translation notification to the languages that already had something translated of the page: https://meta.wikimedia.org/wiki/Bassel_Khartabil
  • Got a fatal error (as expected): T63122#3518366: [WY2yRApAEKcAABSE9XgAAAAC] 2017-08-11 13:34:37: Fatal exception of type "Wikimedia\Rdbms\DBTransactionSizeError"
  • Translation notifications were delivered notwithstanding (all or just what the bot could fetch before the exception?).
  • The action was not logged.

Sorry but the extension is just broken. It breaks when several translation notifications are to be sent, and does not log things or logs them erroneously. If the Language-Team is not willing to work on fixing this extension we better disable it from our sites. This could be a useful feature but not in this state.

2017-08-11 13:34:37 [WY2yRApAEKcAABSE9XgAAAAC] mw1212 metawiki 1.30.0-wmf.13 exception ERROR: [WY2yRApAEKcAABSE9XgAAAAC] /wiki/Special:NotifyTranslators   Wikimedia\Rdbms\DBTransactionSizeError from line 1177 of /srv/mediawiki/php-1.30.0-wmf.13/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Transaction spent 5.2927534580231 second(s) in writes, exceeding the 3 limit. {"exception_id":"WY2yRApAEKcAABSE9XgAAAAC","caught_by":"mwe_handler"}
[Exception Wikimedia\Rdbms\DBTransactionSizeError] (/srv/mediawiki/php-1.30.0-wmf.13/includes/libs/rdbms/loadbalancer/LoadBalancer.php:1177) Transaction spent 5.2927534580231 second(s) in writes, exceeding the 3 limit.
  #0 [internal function]: Closure$Wikimedia\Rdbms\LoadBalancer::approveMasterChanges(Wikimedia\Rdbms\DatabaseMysqli)
  #1 /srv/mediawiki/php-1.30.0-wmf.13/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1502): call_user_func_array(Closure$Wikimedia\Rdbms\LoadBalancer::approveMasterChanges;300, array)
  #2 /srv/mediawiki/php-1.30.0-wmf.13/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1187): Wikimedia\Rdbms\LoadBalancer->forEachOpenMasterConnection(Closure$Wikimedia\Rdbms\LoadBalancer::approveMasterChanges;300)
  #3 [internal function]: Wikimedia\Rdbms\LoadBalancer->approveMasterChanges(array)
  #4 /srv/mediawiki/php-1.30.0-wmf.13/includes/libs/rdbms/lbfactory/LBFactory.php(183): call_user_func_array(array, array)
  #5 [internal function]: Closure$Wikimedia\Rdbms\LBFactory::forEachLBCallMethod(Wikimedia\Rdbms\LoadBalancer, string, array)
  #6 /srv/mediawiki/php-1.30.0-wmf.13/includes/libs/rdbms/lbfactory/LBFactoryMulti.php(417): call_user_func_array(Closure$Wikimedia\Rdbms\LBFactory::forEachLBCallMethod;230, array)
  #7 /srv/mediawiki/php-1.30.0-wmf.13/includes/libs/rdbms/lbfactory/LBFactory.php(186): Wikimedia\Rdbms\LBFactoryMulti->forEachLB(Closure$Wikimedia\Rdbms\LBFactory::forEachLBCallMethod;230, array)
  #8 /srv/mediawiki/php-1.30.0-wmf.13/includes/libs/rdbms/lbfactory/LBFactory.php(223): Wikimedia\Rdbms\LBFactory->forEachLBCallMethod(string, array)
  #9 /srv/mediawiki/php-1.30.0-wmf.13/includes/MediaWiki.php(598): Wikimedia\Rdbms\LBFactory->commitMasterChanges(string, array)
  #10 /srv/mediawiki/php-1.30.0-wmf.13/includes/MediaWiki.php(571): MediaWiki::preOutputCommit(RequestContext, Closure$MediaWiki::main;324)
  #11 /srv/mediawiki/php-1.30.0-wmf.13/includes/MediaWiki.php(884): MediaWiki->doPreOutputCommit(Closure$MediaWiki::main;324)
  #12 /srv/mediawiki/php-1.30.0-wmf.13/includes/MediaWiki.php(523): MediaWiki->main()
  #13 /srv/mediawiki/php-1.30.0-wmf.13/index.php(43): MediaWiki->run()
  #14 /srv/mediawiki/w/index.php(3): include(string)
  #15 {main}

A second batch of translation notifications is being sent now (I've done nothing so I guess the jobs are requeued?) starting from https://meta.wikimedia.org/w/index.php?diff=17102364&oldid=17101919&rcid=10313334

Note that the user already received the same notification one hour ago so I guess this is what happens:

  • the extension tries to load not in the background the whole translation notification stuff
  • fatal error due to excesive db waiting
  • the extension knows the job didn't finished, so it requeues the job again, producing duplicate notifications, or triplicated in case it requeues again itself.

Wondering if we could somewhat mirror what MassMessage does with regards to the sending of massmessages. Even if there are hundreds of recipients it does not fatal.

Krinkle added a subscriber: Krinkle.

Not seen in WMF Logstash for at least 30 days (no matches for errors relating to "Special:NotifyTranslators" or SpecialNotifyTranslators).

However, given that this relates to an action that isn't very frequently performed by users, I'll leave this open for now for someone more familiar with the issue and able to perform the action, to confirm whether it is resolved

The page has a big warning to instructs not to use the page.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:10 PM