Page MenuHomePhabricator

Dry-run, then actually run updateVarDumps
Closed, ResolvedPublic

Description

What: Dry-run, audit results, then run for real the updateVarDumps.php script for AbuseFilter
When: Whenever this patch reaches production.
Why: To kill many layers of back-compat and a lot of tech debt (see parent task).
Where: All wikis - beta cluster first, then production.
Who: Whoever wants to do that. (@Daimona can do the Beta Cluster part).

How:

  1. First of all, run updateVarDumps.php --dry-run and post the results.
  2. @Daimona will audit the results
  3. If necessary, we may have to re-run the script with --dry-run-verbose for selected wikis, then repeat (2.)
  4. Find a DateTime suitable for both of us
  5. Time to run it for real. The ideal steps are:
    1. Measure the size of the abuse_filter_log table (on enwiki?); measure the size of the ExternalStore (I don't know whether it's feasible)
    2. Run the script!
    3. Measure the sizes again and check how much space was saved.
  6. Done!

Command to run

mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --print-orphaned-records --progress-markers [ --dry-run ] > foobar.log

Note: Make sure to redirect the output to a file! The script will print thousands of orphaned records, up to 22 millions for enwiki. You can tee the output for small wikis if you wish.

  • Beta Cluster
  • Closed wikis
  • Test wikis
  • Group0 wikis
  • Group1 wikis except commonswiki
  • Group2 wikis except wikis that are in large.dblist
  • Large wikis
    • svwiki
    • kowiki
    • trwiki
    • cswiki
    • jawiki
    • itwiki
    • nlwiki
    • frwiki
    • ruwiki
    • fawiki
    • zhwiki
    • arwiki
    • ptwiki
    • commonswiki
    • eswiki
    • enwiki

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
OpenNone
OpenNone
ResolvedNone
ResolvedDaimona
ResolvedPRODUCTION ERRORDaimona
ResolvedPRODUCTION ERRORDaimona
StalledNone
StalledNone
ResolvedDaimona
OpenNone
ResolvedDaimona
ResolvedDaimona
ResolvedPRODUCTION ERRORDaimona
ResolvedPRODUCTION ERRORDaimona
ResolvedDaimona
ResolvedDaimona
ResolvedPRODUCTION ERRORDaimona
ResolvedUrbanecm
DeclinedDaimona
ResolvedDaimona

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Is the problem that the data actually contains serialized WikiPage objects?

More generally, the data contains serialized objects of many different types. WikiPage is the main problem, but we also have Article, Revision and whatnot, especially if we consider the class properties of these objects (e.g. T187731).

Can you past an example blob or two somewhere?

I can't, because I do not have prod access, and the Beta Cluster was cleaned up a few months ago. With shell access, you can easily retrieve blobs by following the process described in T187731, i.e.

  • Pick any AbuseLog entry from a group2 wiki older than 28/02/2013 (when rEABF42bd0d84f4244ca2304c6d161f71d90a6a53030c was merged), note its ID
  • Run the following SELECT afl_var_dump FROM abuse_filter_log WHERE afl_id = <your ID>
  • Feed the numeric value you got from that query to maintenance/fetchText.php

Very old entries (2008 or before, I think) might not work, because they didn't use the text table.

eprodromou subscribed.

OK, looks like we'll pick this up in Clinic Duty.

Could someone please restart this script? It's blocking a lot of work in the AbuseFilter codebase, because we cannot touch any of the affected classes (most notably AbuseFilterVariableHolder). Trying to do so would result in all old entries being lost. Also, might this be run with a lower sleep between batches so that it completes in a shorter time? Thanks!

OK, looks like we'll pick this up in Clinic Duty.

Could someone please restart this script? It's blocking a lot of work in the AbuseFilter codebase, because we cannot touch any of the affected classes (most notably AbuseFilterVariableHolder). Trying to do so would result in all old entries being lost. Also, might this be run with a lower sleep between batches so that it completes in a shorter time? Thanks!

{{ping}} any updates from clinic duty?

Martin and I will run it (I'll be there mostly for emotional support though)

As a preparation for the running, I ran foreachwikiindblist group0 mysql.php -- -e 'select DATABASE(), count(*) from updatelog where ul_key="UpdateVarDumps"' for group0, group1 and group2, to figure out where the script was (not) run.

That leaves us with the following group0 and group1 wikis:

[urbanecm@mwmaint2001 ~/updateVarDumps]$ grep --no-filename '   0' group0-parsed.log group1-parsed.log
apiportalwiki   0
commonswiki     0
enwikisource    0
jawikivoyage    0
thankyouwiki    0
[urbanecm@mwmaint2001 ~/updateVarDumps]$

plus all the group2 wikis.

Mentioned in SAL (#wikimedia-operations) [2020-10-19T10:53:45Z] <Urbanecm> [urbanecm@mwmaint2001 ~/updateVarDumps/script]$ mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=jawikivoyage --print-orphaned-records-to=- --progress-markers # T246539

Mentioned in SAL (#wikimedia-operations) [2020-10-19T10:57:23Z] <Urbanecm> Start mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=enwikisource --print-orphaned-records-to=/tmp/urbanecm/enwikisource-orphaned.log --progress-markers in a tmux session named updateVarDumps at mwmaint2001 (T246539)

Mentioned in SAL (#wikimedia-operations) [2020-10-19T11:13:12Z] <Urbanecm> Manually run mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log for several small group2 wikis (T246539)

Mentioned in SAL (#wikimedia-operations) [2020-10-19T11:40:41Z] <Urbanecm> [urbanecm@mwmaint2001 ~/updateVarDumps/script]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log; done < ../small-group2.dblist # T246539 # small-group2.dblist is wikis from small.dblist that are also in group2.dblist

Mentioned in SAL (#wikimedia-operations) [2020-10-19T11:42:00Z] <Urbanecm> End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=enwikisource --print-orphaned-records-to=/tmp/urbanecm/enwikisource-orphaned.log --progress-markers (T246539)

Mentioned in SAL (#wikimedia-operations) [2020-10-19T11:43:34Z] <Urbanecm> End of [urbanecm@mwmaint2001 ~/updateVarDumps/script]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log; done < ../small-group2.dblist # T246539 # small-group2.dblist is wikis from small.dblist that are also in group2.dblist

Mentioned in SAL (#wikimedia-operations) [2020-10-19T13:34:31Z] <Urbanecm> Start of [urbanecm@mwmaint2001 ~/updateVarDumps/output/group2-medium]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > output/$wiki.log; done < wikis.dblist (T246539; wikis.dblist is medium wikis from group2.dblist)

Mentioned in SAL (#wikimedia-operations) [2020-10-19T11:43:34Z] <Urbanecm> End of [urbanecm@mwmaint2001 ~/updateVarDumps/script]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log; done < ../small-group2.dblist # T246539 # small-group2.dblist is wikis from small.dblist that are also in group2.dblist

Outputs are below (two files per wiki):

Mentioned in SAL (#wikimedia-operations) [2020-10-19T11:43:34Z] <Urbanecm> End of [urbanecm@mwmaint2001 ~/updateVarDumps/script]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log; done < ../small-group2.dblist # T246539 # small-group2.dblist is wikis from small.dblist that are also in group2.dblist

Outputs are below (two files per wiki):

Checked, nothing special happened here (i.e. steps 1 and 2 didn't find any row)

Mentioned in SAL (#wikimedia-operations) [2020-10-21T08:46:36Z] <Urbanecm> [urbanecm@mwmaint2001 ~/updateVarDumps/output/group2-medium/output]$ mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=apiportalwiki # T246539

Mentioned in SAL (#wikimedia-operations) [2020-10-21T08:50:06Z] <Urbanecm> mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log # wiki=cebwiki; T246539

Mentioned in SAL (#wikimedia-operations) [2020-10-21T08:51:59Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log (wiki=viwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-10-21T09:30:31Z] <Urbanecm> End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log (wiki=viwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-10-21T09:37:32Z] <Urbanecm> mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log # wiki=warwiki; T246539

Mentioned in SAL (#wikimedia-operations) [2020-10-21T09:38:18Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log (wiki=shwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-10-21T09:42:17Z] <Urbanecm> End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log (wiki=shwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-10-21T09:42:40Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log (wiki=nowiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-10-19T13:34:31Z] <Urbanecm> Start of [urbanecm@mwmaint2001 ~/updateVarDumps/output/group2-medium]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > output/$wiki.log; done < wikis.dblist (T246539; wikis.dblist is medium wikis from group2.dblist)

Output is at https://usercontent.irccloud-cdn.com/file/ScOyahmg/group2-medium.tar.bz2 (sorry for not using Phab's filestorage, apparently I can't upload ~7 MB file there).

Mentioned in SAL (#wikimedia-operations) [2020-10-21T10:00:29Z] <Urbanecm> End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log (wiki=nowiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-10-21T10:01:47Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log (wiki=srwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-10-21T10:37:41Z] <Urbanecm> End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log (wiki=srwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-10-21T10:38:27Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log (wiki=rowiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-10-22T11:54:19Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux session updateVarDumps at mwmaint2001 (wiki=huwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-10-29T13:23:14Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux session updateVarDumps at mwmaint2001 (wiki=idwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-10-29T13:25:20Z] <Urbanecm> Correction: Obviously 1002 (T246539)

Mentioned in SAL (#wikimedia-operations) [2020-10-29T19:22:29Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux session on mwmaint1002 (wiki=ukwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-11-04T10:23:28Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux session updateVarDumps at mwmaint1002 (wiki=fiwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-11-05T11:05:37Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux session updateVarDumps at mwmaint1002 (wiki=dewiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-11-09T09:54:33Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux session at mwmaint1002 (wiki=svwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-11-09T14:03:12Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux session at mwmaint1002 (wiki=kowiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-11-09T16:34:19Z] <Urbanecm> End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux session at mwmaint1002 (wiki=kowiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-11-09T16:34:45Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux session at mwmaint1002 (wiki=trwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-11-12T13:38:01Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux session updateVarDumps at mwmaint1002 (wiki=cswiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-11-12T16:11:53Z] <Urbanecm> End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux session updateVarDumps at mwmaint1002 (wiki=cswiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-11-12T16:12:23Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=jawiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-11-17T14:37:10Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=itwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-11-17T20:42:59Z] <Urbanecm> End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=itwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-11-18T07:28:33Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=nlwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-11-18T11:56:17Z] <Urbanecm> End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=nlwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-11-18T11:56:29Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=frwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-11-19T11:00:12Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=ruwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-11-20T12:11:01Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=fawiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-11-23T12:01:32Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=zhwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-12-01T22:41:27Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=arwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-12-02T14:10:22Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=ptwiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-12-02T14:12:01Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=commonswiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-12-03T11:52:17Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=eswiki; T246539)

Mentioned in SAL (#wikimedia-operations) [2020-12-03T11:57:24Z] <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=enwiki; T246539)

@Daimona I just started the two remaining wikis. Once they finish, this can be finally closed \o/

Mentioned in SAL (#wikimedia-operations) [2020-12-05T00:40:33Z] <Urbanecm> End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=eswiki; T246539)

Enwiki just finished a few hours ago, so I officially declare this done!