Page MenuHomePhabricator

Enable user notifications for old unpublished ContentTranslation draft purge script
Open, MediumPublic

Description

Enable ContentTranslation draft purge script [1] with following steps:

  • 1. Manual dry-run with the notify-age-in-days param

Ran:

foreachwikiindblist '%% wikipedia.dblist - special.dblist - closed.dblist' extensions/ContentTranslation/scripts/purge-unpublished-drafts.php --age-in-days=455 --notify-age-in-days=425 |& tee -a ~/purgelog-20200826-dry-notify.txt

Result:

  • 2. Manual run with the notify-age-in-days param

Result:

  • 3. Add the notify-age-in-days param to the cron run [2]

[1] https://gerrit.wikimedia.org/g/mediawiki/extensions/ContentTranslation/+/eacc96672bd793a40bd57afde5fa11b1fbc997a7/scripts/purge-unpublished-drafts.php

[2] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/manifests/mediawiki/maintenance/purge_old_cx_drafts.pp

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptTue, Aug 25, 8:23 AM

Looks like we've some issue with script:

-----------------------------------------------------------------                                                                                                                                                 
abwiki                                                                                                                                                                                                            
-----------------------------------------------------------------                                                                                                                                                 
abwiki:  $wgContentTranslationTranslateInTarget is enabled. This script must be run separately for each target language.                                                                                          
abwiki:  Running for language ab                                                                                                                                                                                  
abwiki:  DRY-RUN mode: no actions are taken on drafts unless you use --really                                                                                                                                     
abwiki:                                                                                                                                                                                                           
Wikimedia\Rdbms\DBQueryError from line 1699 of /srv/mediawiki/php-1.36.0-wmf.5/includes/libs/rdbms/database/Database.php: Error 1146: Table 'wikishared.cx_notification_log' doesn't exist (10.64.48.111)         
Function: ContentTranslation\Scripts\PurgeUnpublishedDrafts::execute
Query: SELECT  MAX(cxn_newest)  FROM `cx_notification_log`     LIMIT 1  

#0 /srv/mediawiki/php-1.36.0-wmf.5/includes/libs/rdbms/database/Database.php(1683): Wikimedia\Rdbms\Database->getQueryException('Table 'wikishar...', 1146, 'SELECT  MAX(cxn...', 'ContentTranslat...')
#1 /srv/mediawiki/php-1.36.0-wmf.5/includes/libs/rdbms/database/Database.php(1658): Wikimedia\Rdbms\Database->getQueryExceptionAndLog('Table 'wikishar...', 1146, 'SELECT  MAX(cxn...', 'ContentTranslat...')
#2 /srv/mediawiki/php-1.36.0-wmf.5/includes/libs/rdbms/database/Database.php(1227): Wikimedia\Rdbms\Database->reportQueryError('Table 'wikishar...', 1146, 'SELECT  MAX(cxn...', 'ContentTranslat...', false)
#3 /srv/mediawiki/php-1.36.0-wmf.5/includes/libs/rdbms/database/Database.php(1907): Wikimedia\Rdbms\Database->query('SELECT  MAX(cxn...', 'ContentTranslat...', 32)
#4 /srv/mediawiki/php-1.36.0-wmf.5/includes/libs/rdbms/database/Database.php(1746): Wikimedia\Rdbms\Database->select('cx_notification...', 'MAX(cxn_newest)', Array, 'ContentTranslat...', Array, Array)
#5 /srv/mediawiki/php-1.36.0-wmf.5/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->selectField('cx_notification...', 'MAX(cxn_newest)', Array, 'ContentTranslat...')
#6 /srv/mediawiki/php-1.36.0-wmf.5/includes/libs/rdbms/database/DBConnRef.php(300): Wikimedia\Rdbms\DBConnRef->__call('selectField', Array)
#7 /srv/mediawiki/php-1.36.0-wmf.5/extensions/ContentTranslation/scripts/purge-unpublished-drafts.php(102): Wikimedia\Rdbms\DBConnRef->selectField('cx_notification...', 'MAX(cxn_newest)', Array, 'ContentTransla
t...')
#8 /srv/mediawiki/php-1.36.0-wmf.5/maintenance/doMaintenance.php(107): ContentTranslation\Scripts\PurgeUnpublishedDrafts->execute()
#9 /srv/mediawiki/php-1.36.0-wmf.5/extensions/ContentTranslation/scripts/purge-unpublished-drafts.php(304): require_once('/srv/mediawiki/...')
#10 /srv/mediawiki/multiversion/MWScript.php(101): require_once('/srv/mediawiki/...')
#11 {main}
KartikMistry triaged this task as Medium priority.Tue, Aug 25, 12:06 PM
KartikMistry updated the task description. (Show Details)

--notify-age-in-days value should be smaller than --age-in-days because otherwise we send a notification and immediately delete the drafts.

--notify-age-in-days value should be smaller than --age-in-days because otherwise we send a notification and immediately delete the drafts.

Thanks. I'll update dry-run tomorrow.

KartikMistry renamed this task from Test user notification manually with cx draft purge script to Test user notifications for old unpublished cx draft purge script.Wed, Aug 26, 1:43 AM
KartikMistry updated the task description. (Show Details)

--notify-age-in-days value should be smaller than --age-in-days because otherwise we send a notification and immediately delete the drafts.

Thanks. I'll update dry-run tomorrow.

Dry run output updated with --notify-age-in-days=425

KartikMistry updated the task description. (Show Details)Wed, Aug 26, 8:32 AM

Change 622528 had a related patch set uploaded (by KartikMistry; owner: KartikMistry):
[operations/puppet@production] Add --notify-age-in-days option to notify users before draft purge

https://gerrit.wikimedia.org/r/622528

Mentioned in SAL (#wikimedia-operations) [2020-08-26T11:39:16Z] <kart_> Started manual run of ContentTranslation/scripts/purge-unpublished-drafts.php script on mwmaint1002 (T261189)

Mentioned in SAL (#wikimedia-operations) [2020-08-26T11:53:28Z] <kart_> Finished manual run of ContentTranslation/scripts/purge-unpublished-drafts.php script on mwmaint1002 (T261189)

@Nikerabbit Can you check output of manual run with --notify-age-in-days=425? It seems it runs fine for arwiki atleast, but wasn't able to found any draft and dates seems wrong for other wikis (eg 2019-06-27 and 2019-06-28)

KartikMistry updated the task description. (Show Details)

@Nikerabbit Can you check output of manual run with --notify-age-in-days=425? It seems it runs fine for arwiki atleast, but wasn't able to found any draft and dates seems wrong for other wikis (eg 2019-06-27 and 2019-06-28)

The code has a failsafe that it only checks for range of 15 days on the very first run of the script to avoid a flood of notifications. With this in mind the output looks correct to me.

Change 622528 had a related patch set uploaded (by KartikMistry; owner: KartikMistry):
[operations/puppet@production] Add --notify-age-in-days option to notify users before draft purge

https://gerrit.wikimedia.org/r/622528

T183890 says: "After three months (90 days) since the user started a translation, the user is encouraged to continue, and reminded that it will be deleted after a year."

But using 90 would send a lot of notifications at once, so I think 435 is a good start and we can incrementally make the value smaller over time to spread out the notifications and coordinate with @Pginer-WMF

But using 90 would send a lot of notifications at once, so I think 435 is a good start and we can incrementally make the value smaller over time to spread out the notifications and coordinate with @Pginer-WMF

Sounds good to me. If I recall correctly we followed a similar approach with the manual runs.

Thanks @Pginer-WMF I'll go ahead and ask for review/merge the cronjob patch.

Change 622528 merged by Alexandros Kosiaris:
[operations/puppet@production] Add --notify-age-in-days option to notify users before draft purge

https://gerrit.wikimedia.org/r/622528

Thanks @akosiaris

Notes for QA

  • Script will run on 18th Sep.
  • Logs will be available on mwmain2001: /var/log/mediawiki/mediawiki_job_purge_old_cx_drafts/syslog.log
  • Look for notification not sent or any similar issue(s).
KartikMistry updated the task description. (Show Details)Thu, Sep 3, 12:43 PM
Jpita added a subscriber: Jpita.Wed, Sep 9, 5:26 PM

This is in Pending input for QA until it can be tested on 18 or 19th of September

KartikMistry renamed this task from Test user notifications for old unpublished cx draft purge script to Enable user notifications for old unpublished ContentTranslation draft purge script.Thu, Sep 10, 6:38 AM
KartikMistry updated the task description. (Show Details)

After a successful run, we need a plan how to bring the value from 425 to 90.

Script run is done as per cron. @Nikerabbit Is this looks OK?

Sample output about notification to user:

Sep 18 10:33:19 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: enwiki
Sep 18 10:33:19 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: ----------------------------------------------------------------- 
Sep 18 10:33:19 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: enwiki:  $wgContentTranslationTranslateInTarget is enabled. This script must be run separately for each target language. 
Sep 18 10:33:19 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: enwiki:  Running for language en 
Sep 18 10:33:19 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: enwiki:  Notifying users to continue their old translations 
Sep 18 10:33:19 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: enwiki:  Selecting drafts with last modified timestamp between 2019-07-20 and 2019-07-21 
Sep 18 10:33:20 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: enwiki:  Found 1 old draft 
Sep 18 10:33:20 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: enwiki:    705471 2019-07-20T21:47:30+00:00 es→en       Fortaleza de Ujarma

I found two issues while inspecting the logs, one pre-existing and one new.

As a background, the script loops over each wiki, because we had troubles delivering notifications to users in a wiki other than where the script is run.

The pre-existing issue is that the list of wikis include test.wikipedia.org. This is a special wiki where CX has been configured to "translate in the target", but still used the shared, global database tables.

This means that when the loop gets to testwiki, the purge script detects this condition and instead of only going over drafts *->testwiki (I guess en, but doesn't matter), it goes over all drafts and purges and notifies them. So for any wiki that appears after testwiki in the list, our script would not do anything, since testwiki already went over them. To make things worse, the purges for testwiki succeed, but any notification will fail (because it would need to send the notification to an user in another wiki).

Suggested fixes:

  • Immediately remove testwiki from the script runs
  • Setup CX on testwiki so that it does not used the shared database tables, but instead of a local copy of the tables. If this is done, testwiki can be restored on the script runs.

The new issue is that the script keeps track of the timestamp of the newest draft it has sent a notification for. However, this was written before we started running the script inside a loop for each wiki. The effect is that the timestamp gets more and more recent on each wiki where the script is run and where it find such drafts, and for the later wikis the notification period gets shorter and shorter, missing many drafts which should be notified. The reason for logging the timestamp is a protection feature to avoid sending multiple notifications when the script is running manually, or when the regular runs have overlapping notification periods.

Suggested fixes:

  • Amend the global database table to include a wiki-id.

Accompanying evidence for above:

Existing issue: testwiki output:

Sep 18 10:45:47 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: -----------------------------------------------------------------
Sep 18 10:45:47 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: test2wiki
Sep 18 10:45:47 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: -----------------------------------------------------------------
Sep 18 10:45:47 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: The following extensions are required to be installed for this script to run: ContentTranslation. Please enable them and then try again.
Sep 18 10:45:47 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: -----------------------------------------------------------------
Sep 18 10:45:47 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki
Sep 18 10:45:47 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: -----------------------------------------------------------------
Sep 18 10:45:47 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:  Notifying users to continue their old translations
Sep 18 10:45:47 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:  Selecting drafts with last modified timestamp between 2019-07-20 and 2019-07-21
Sep 18 10:45:48 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:  Found 0 old drafts
Sep 18 10:45:48 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:
Sep 18 10:45:48 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:
Sep 18 10:45:48 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:  Purging old drafts
Sep 18 10:45:48 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:  Selecting drafts with last modified timestamp before 2019-06-21
Sep 18 10:45:49 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:  Found 196 old drafts
Sep 18 10:45:49 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:    658074 2019-06-06T01:47:56+00:00 en→zh       The Phoenix and the Turtle — PURGED
Sep 18 10:45:49 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:    670760 2019-06-06T04:13:05+00:00 en→zh       Harold Kelley — PURGED
Sep 18 10:45:50 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:    670794 2019-06-06T05:25:18+00:00 en→zh       Uralic Phonetic Alphabet — PURGED
Sep 18 10:45:50 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:    670797 2019-06-06T05:39:39+00:00 en→zh       Wikipedia:Competence is acquired — PURGED
Sep 18 10:45:51 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:    670796 2019-06-06T05:50:17+00:00 en→vi       Ctenopelmatinae — PURGED
Sep 18 10:45:51 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:    615702 2019-06-06T09:34:30+00:00 en→zh       Pakistan Cricket Board — PURGED
Sep 18 10:45:51 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:    632552 2019-06-06T14:59:29+00:00 en→zh       File:Clear logo 2014.png — PURGED
Sep 18 10:45:52 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:    671106 2019-06-06T15:02:47+00:00 en→tw       Freda Akosua Prempeh — PURGED
Sep 18 10:45:52 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:    565560 2019-06-06T21:00:54+00:00 en→zu       Flag of South Africa — PURGED
Sep 18 10:45:52 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:    671329 2019-06-07T02:12:47+00:00 en→vi       HE 1327-2326 — PURGED
Sep 18 10:45:53 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:    671370 2019-06-07T04:06:11+00:00 en→vi       Layer 2 Tunneling Protocol — PURGED
Sep 18 10:45:53 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:    671383 2019-06-07T04:48:00+00:00 en→vi       GSC 02620-00648 — PURGED
Sep 18 10:45:54 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:    671408 2019-06-07T06:17:29+00:00 en→vi       Cayrel's Star — PURGED
Sep 18 10:45:54 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:    671428 2019-06-07T06:51:53+00:00 en→ur       List of social networking websites — PURGED
Sep 18 10:45:54 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:    671435 2019-06-07T06:56:44+00:00 en→ur       List of international cricket centuries by Hashim Amla — PURGED
Sep 18 10:45:55 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:    671442 2019-06-07T07:00:50+00:00 en→ur       List of international cricket centuries by Inzamam-ul-Haq — PURGED
Sep 18 10:45:55 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:    671453 2019-06-07T07:10:08+00:00 en→ur       List of international cricket centuries at the Sheikh Abu Naser Stadium — PURGED
Sep 18 10:45:55 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:    657445 2019-06-07T07:12:20+00:00 ru→uk       Волгодонск (аэропорт) — PURGED
<snip>
Sep 18 10:47:05 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:  ERROR: Trying to notify unknown user with ID 58290406 about the deletion of the following page(s):
Sep 18 10:47:05 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:   * The Phoenix and the Turtle
Sep 18 10:47:05 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:  ERROR: Trying to notify unknown user with ID 46552016 about the deletion of the following page(s):
Sep 18 10:47:05 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:   * Harold Kelley
Sep 18 10:47:05 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:  ERROR: Trying to notify unknown user with ID 57871987 about the deletion of the following page(s):
Sep 18 10:47:05 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:   * Uralic Phonetic Alphabet
Sep 18 10:47:05 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:  ERROR: Trying to notify unknown user with ID 52486722 about the deletion of the following page(s):
Sep 18 10:47:05 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:   * Wikipedia:Competence is acquired
Sep 18 10:47:05 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:  ERROR: Trying to notify unknown user with ID 58002657 about the deletion of the following page(s):
Sep 18 10:47:05 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:   * Ctenopelmatinae
Sep 18 10:47:05 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:  ERROR: Trying to notify unknown user with ID 43450964 about the deletion of the following page(s):
Sep 18 10:47:05 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:   * Pakistan Cricket Board
Sep 18 10:47:05 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:   * 旧暦
Sep 18 10:47:05 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:  ERROR: Trying to notify unknown user with ID 58594490 about the deletion of the following page(s):
Sep 18 10:47:05 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:   * Freda Akosua Prempeh
Sep 18 10:47:07 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:  ERROR: Trying to notify unknown user with ID 52731911 about the deletion of the following page(s):
Sep 18 10:47:07 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:   * Layer 2 Tunneling Protocol
Sep 18 10:47:07 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:   * DirectSound
Sep 18 10:47:07 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:   * 2019 Copa América squads
Sep 18 10:47:07 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:   * 2019 UEFA European Under-21 Championship
Sep 18 10:47:07 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:  ERROR: Trying to notify unknown user with ID 2947337 about the deletion of the following page(s):
Sep 18 10:47:07 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:   * Волгодонск (аэропорт)
Sep 18 10:47:07 mwmaint2001 mediawiki_job_purge_old_cx_drafts[709]: testwiki:   * Багаевская
<snip>

New issue: repetitive entries in notification log:

wikiadmin@10.192.32.134(wikishared)> select * from cx_notification_log;
+--------+------------+----------------+
| cxn_id | cxn_date   | cxn_newest     |
+--------+------------+----------------+
|      1 | 2020-08-26 | 20190613044911 |
|      2 | 2020-08-26 | 20190627185916 |
|      3 | 2020-08-26 | 20190627214524 |
|      4 | 2020-08-26 | 20190627221214 |
|      5 | 2020-08-26 | 20190627231220 |
|      6 | 2020-08-26 | 20190627235336 |
|      7 | 2020-09-18 | 20190715065154 |
|      8 | 2020-09-18 | 20190719153240 |
|      9 | 2020-09-18 | 20190720180448 |
|     10 | 2020-09-18 | 20190720185818 |
|     11 | 2020-09-18 | 20190720213907 |
|     12 | 2020-09-18 | 20190720214730 |
|     13 | 2020-09-18 | 20190720215448 |
|     14 | 2020-09-18 | 20190720215754 |
+--------+------------+----------------+
14 rows in set (0.01 sec)

Change 628758 had a related patch set uploaded (by KartikMistry; owner: KartikMistry):
[operations/puppet@production] Exclude testwikis and private wikis from CX draft purge script run

https://gerrit.wikimedia.org/r/628758