Page MenuHomePhabricator

Fourth manual run of unpublished draft purge script
Closed, ResolvedPublic

Description

Fourth manual run of unpublished draft purge script for CX. This is also testing for number of drafts to set in regular cronjob with new purge script.

Run: foreachwikiindblist '%% wikipedia.dblist - special.dblist - closed.dblist' extensions/ContentTranslation/scripts/purge-unpublished-drafts.php --age-in-days=800 --really |& tee -a ~/purgelog.txt on mwmaint1002.

  • age-in-days: 1300
  • number of drafts to purge = 1142
  • schedule: 2019-02-12

Tasks:

  • Dry Run
  • Actual Run

Also see:

Event Timeline

Pginer-WMF triaged this task as Medium priority.Aug 29 2018, 7:17 AM
santhosh renamed this task from Forth manual run of unpublished draft purge script to Fourth manual run of unpublished draft purge script.Sep 3 2018, 10:53 AM
santhosh updated the task description. (Show Details)
Pginer-WMF subscribed.

@KartikMistry, is this still blocked? Which work are you waiting for this to be possible to proceed?

@KartikMistry, is this still blocked? Which work are you waiting for this to be possible to proceed?

Blocked on: https://phabricator.wikimedia.org/T203071

@KartikMistry, is this still blocked? Which work are you waiting for this to be possible to proceed?

Blocked on: https://phabricator.wikimedia.org/T203071

Thanks. We'll keep an eye on the blocking ticket: T203071: Allow the notification of deleted old translations to users missing a local account

I believe the correct (but untested) invocation is now:

foreachwikiindblist '%% wikipedia.dblist - special.dblist - closed.dblist' mwscript extensions/ContentTranslation/scripts/purge-unpublished-drafts.php --age-in-days=###

Add age-in-days, --really and logging as needed. No need to include --wiki.

I believe the correct (but untested) invocation is now:

foreachwikiindblist '%% wikipedia.dblist - special.dblist - closed.dblist' mwscript extensions/ContentTranslation/scripts/purge-unpublished-drafts.php --age-in-days=###

Add age-in-days, --really and logging as needed. No need to include --wiki.

Thanks!

Planning to run script on 17 December (To make sure that script is available in all wikis).

Restricted Application changed the subtype of this task from "Task" to "Deadline". · View Herald TranscriptJan 17 2019, 12:15 PM

Scheduling this run on 22/01 Morning (IST). Dry-run before that on 21/01.

Need to estimate drafts again with correct command.

It seems there is somewhat unclear bug that causes waitForReplication to wait for the maximum time (60s on cli, 1s on web). @jcrespo thinks it is possibly related to T172497.

This potentially makes the script runs very slow, as we call waitForReplication frequently – probably more frequently than actually necessary (favoring simpler code). It is not clear how many waits are affected – so far it seems that one run of the script has a possibility of having all waits affected or none. Assuming all waits would be affected (~60s per draft), the script run would take multiple days.

I see couple of options:

  1. Skip waitForReplication in dry-run mode. We are not making any writes so it is not necessary.
  2. Reduce the calls for waitForReplication – possibly some could be done, but we don't want to risk creating lag ourselves.
  3. Hope that this problem only happens in dry-run mode, and real runs would not be affected, so that (1) would be enough.
  4. Try to get this issue analyzed by people familiar with the database code (via SoS etc.)

In my opinion (1) is uncontroversial and could be done easily right away. After that we can do proper estimates, so that we can do small actual runs to estimate how often this issue happens.

Change 485662 had a related patch set uploaded (by KartikMistry; owner: KartikMistry):
[mediawiki/extensions/ContentTranslation@master] Debug: Remove waitForReplication for dry-run

https://gerrit.wikimedia.org/r/485662

Change 485666 had a related patch set uploaded (by KartikMistry; owner: KartikMistry):
[mediawiki/extensions/ContentTranslation@wmf/1.33.0-wmf.13] Debug: Remove waitForReplication for dry-run

https://gerrit.wikimedia.org/r/485666

Change 485662 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Debug: Remove waitForReplication for dry-run

https://gerrit.wikimedia.org/r/485662

Change 485666 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@wmf/1.33.0-wmf.13] Debug: Remove waitForReplication for dry-run

https://gerrit.wikimedia.org/r/485666

Mentioned in SAL (#wikimedia-operations) [2019-01-23T00:52:07Z] <ebernhardson@deploy1001> Synchronized php-1.33.0-wmf.13/extensions/ContentTranslation/scripts/purge-unpublished-drafts.php: SWAT T203059 ContentTranslation: Remove waitForReplication for dry-run (duration: 00m 55s)

Restricted Application changed the subtype of this task from "Deadline" to "Task". · View Herald TranscriptJan 23 2019, 6:54 AM

I did dry-run today and 30000+ old drafts will be purged, however, we should wait till issues like T214402 are solved or atleast investigated.

With --age-in-days=1260 purgable drafts are around 2564. We should go with this and observe issues and decide next step. @Nikerabbit Is that sounds good?

I would go even lower than that, just to be safe (in the sense that we don't have monitor the script for hours and hours if it is slow) and make sure there is limited impact if there is something wrong with the previous changes.

I would go even lower than that, just to be safe (in the sense that we don't have monitor the script for hours and hours if it is slow) and make sure there is limited impact if there is something wrong with the previous changes.

With --age-in-days=1300, drafts to purge are, 665. Is that OK?

KartikMistry updated the task description. (Show Details)
KartikMistry updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2019-02-12T06:04:30Z] <kart_> Fourth manual run of unpublished draft purge script (T203059)

Mentioned in SAL (#wikimedia-operations) [2019-02-12T06:33:25Z] <kart_> Finished fourth manual run of unpublished draft purge script (T203059)

Script run is done and 1142 drafts were purged. However, we still have slow query issue mentioned earlier.

Script run is done and 1142 drafts were purged. However, we still have slow query issue mentioned earlier.

But how bad is it? How many times it happened? How long did the purge script take in total?

If the log entries are accurate, that would be about 30 minutes for 1142 drafts, or 1.6 seconds per draft (including all the overhead for running the script for ~300 wikis and so). This is over a magnitude less from the worst case of 60 seconds per draft.

Script run is done and 1142 drafts were purged. However, we still have slow query issue mentioned earlier.

But how bad is it? How many times it happened? How long did the purge script take in total?

13 times to be exact in this run.

Can this be closed now? Which are the next steps? Is a 5th manual run needed or can we go with the automatic cron script (T189091) now?

I would prefer we do further manual runs to clear the backlog (in other words: get "age in days" to the same value that we will use in the scheduled script).

Now that we have confirmed the script works well enough, we can clear the backlog quicker. Currently we can purge about 2000 drafts in one hour of real time, so we either need multiple of those 1-hour runs or longer runs, or both.

Ok. I created a follow-up to make as many iterations of the manual process as needed to reduce the backlog: T216509: Complete manual rounds of unpublished draft purge script until the backlog is clean