Page MenuHomePhabricator

Figure why we can't dump labswiki, aka Wikitech
Closed, ResolvedPublic

Description

A quick perusal of https://dumps.wikimedia.org/backup-index.html reveals we are not dumping labswiki, aka Wikitech.

Slack conversation where we found this behavior.

For Dumps 2.0, we had also disabled this particular wiki, as its content was not available in the Analytics Replicas. That was labtestwiki. not labswiki.

In this task we should investigate and try to fix this for Dumps 2.0 as Wikitech is publicly available and thus should be dumped.

Event Timeline

We seem to not be generating events for it either:

presto> SELECT wiki_id, revision.rev_id from event.mediawiki_page_content_change_v1 where wiki_id = 'labswiki' limit 10;
 wiki_id | rev_id 
---------+--------
(0 rows)

Weird.

@BTullis found root cause elsewhere:

Now that labswiki is a normal database, on the s6 section, are we able to start dumping it, as per all of the other databases?
It looks like it has been excluded from dumps here: https://github.com/wikimedia/operations-puppet/blob/production/modules/snapshot/manifests/dumps/dblists.pp#L18

If we did that, then we wouldn't have to exclude it from the list of dumps imported to HDFS here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1075238

Change #1077403 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] snapshots: Dump wikitech (labswiki) like any other wiki

https://gerrit.wikimedia.org/r/1077403

Change #1077403 merged by Ladsgroup:

[operations/puppet@production] snapshots: Dump wikitech (labswiki) like any other wiki

https://gerrit.wikimedia.org/r/1077403

Some dumps from labswiki where successul, but not the XML dump.

See https://groups.google.com/a/wikimedia.org/g/ops-dumps/c/HUQvxss5I6k for failure email thread.

See https://dumps.wikimedia.org/labswiki/20241001/ for successful/ failed dumps.

From @JAllemandou :

Hi Xabriel,
when this wiki gets successfully into dumps list, can you please let me know?
For the moment we have removed it from the list of imported wikis and it'd be great to remove the special case :)
Thank you!
Joseph

Figure where things are running:

pwd
/mnt/dumpsdata/xmldatadumps/private/labswiki
cat lock_20241001 
snapshot1013.eqiad.wmnet 358532

snapshot1013? Interesting, as I was expecting snapshot1010 as per https://wikitech.wikimedia.org/wiki/Dumps/Snapshot_hosts#Current_setup.

Anyhow, logs show examples of:

dumpsgen@snapshot1010:/mnt/dumpsdata/xmldatadumps/private/labswiki/20241001$ tail dumplog.txt 
       (Will retry 2 more times)
Rebooting getText infrastructure failed (DB is set and has not been closed by the Load Balancer) Trying to continue anyways
getting/checking text tt:12155 failed (Generic error while obtaining text for id tt:12155) for revision 12320
       (Will retry 1 more times)
Rebooting getText infrastructure failed (DB is set and has not been closed by the Load Balancer) Trying to continue anyways
getting/checking text tt:12155 failed (Generic error while obtaining text for id tt:12155) for revision 12320
      
Rebooting getText infrastructure failed (DB is set and has not been closed by the Load Balancer) Trying to continue anyways
getting/checking text tt:12156 failed (Generic error while obtaining text for id tt:12156) for revision 12321
       (Will retry 4 more times)

Many of them:

cat dumplog.txt | grep "Trying to continue anyways" | wc -l
100474

Now let's go to the node running labswiki and nuke all running processes:

ssh snapshot1013.eqiad.wmnet

sudo -u dumpsgen bash

cd /srv/deployment/dumps/dumps/xmldumps-backup

python3 dumpadmin.py --kill --configfile /etc/dumps/confs/wikidump.conf.dumps --wiki labswiki --dryrun
would kill processes ['3655259', '3655265']

python3 dumpadmin.py --kill --configfile /etc/dumps/confs/wikidump.conf.dumps --wiki labswiki

Stale lock was removed automatically.

Dump was picked up automatically rather quickly, and appears to be making progress.

labswiki continues to fail, now with:

PROBLEM: labswiki has file labswiki/20241001/labswiki-20241001-pages-meta-history.xml.bz2.inprog at least 4 hours older than lock

snapshot1013? Interesting, as I was expecting snapshot1010 as per https://wikitech.wikimedia.org/wiki/Dumps/Snapshot_hosts#Current_setup.

I have been doing a little more digging into this, so I now understand a bit more why the backups are sometimes running on hosts other than those we expect.

Added some notes to wikitech about it. The docs should have said that snapshot1010 and snapshot1013 are both running the regular version of the dumps, which means that they share the load of everything except enwiki and wikidatawiki. But even the dedicated hosts for these two biggest wikis will join in and dump the remaining wikis if they finish their primary job. I found this out while investigating why snapshot1012 (dedicated for enwiki) is currently dumping frwiki.

I'll try to have a further look into why the page content dumps for wikitech are still failing. https://dumps.wikimedia.org/labswiki/20241001/

image.png (948×896 px, 141 KB)

This seems to be the most relevant error, from /mnt/dumpsdata/xmldatadumps/private/labswiki/20241001/dumplog.txt

MWException from line 730 of /srv/mediawiki/php-1.43.0-wmf.26/maintenance/includes/TextPassDumper.php: Graceful storage failure
#0 /srv/mediawiki/php-1.43.0-wmf.26/maintenance/includes/TextPassDumper.php(956): TextPassDumper->getText('tt:905703', 'wikitext', 'text/x-wiki', 2516)
#1 [internal function]: TextPassDumper->startElement(Resource id #1328, 'text', Array)
#2 /srv/mediawiki/php-1.43.0-wmf.26/maintenance/includes/TextPassDumper.php(498): xml_parse(Resource id #1328, 'i</format>\n    ...', false)
#3 /srv/mediawiki/php-1.43.0-wmf.26/maintenance/includes/TextPassDumper.php(320): TextPassDumper->readDump(Resource id #1327)
#4 /srv/mediawiki/php-1.43.0-wmf.26/maintenance/includes/TextPassDumper.php(189): TextPassDumper->dump(true)
#5 /srv/mediawiki/php-1.43.0-wmf.26/maintenance/includes/MaintenanceRunner.php(703): TextPassDumper->execute()
#6 /srv/mediawiki/php-1.43.0-wmf.26/maintenance/run.php(51): MediaWiki\Maintenance\MaintenanceRunner->run()
#7 /srv/mediawiki/multiversion/MWScript.php(158): require_onceError from command(s): /usr/bin/php7.4 /srv/mediawiki/multiversion/MWScript.php dumpTextPass.php --wiki=labswiki --stub=gzip:/mnt/dumpsdata/xmldatadu
mps/public/labswiki/20241001/labswiki-20241001-stub-meta-history.xml.gz  --dbgroupdefault=dump --report=1000 --spawn=/usr/bin/php7.4 --output=bzip2:/mnt/dumpsdata/xmldatadumps/public/labswiki/20241001/labswiki-2
0241001-pages-meta-history.xml.bz2.inprog --full 
error from commands: /usr/bin/php7.4 /srv/mediawiki/multiversion/MWScript.php dumpTextPass.php --wiki=labswiki --stub=gzip:/mnt/dumpsdata/xmldatadumps/public/labswiki/20241001/labswiki-20241001-stub-meta-history.xml.gz  --dbgroupdefault=dump --report=1000 --spawn=/usr/bin/php7.4 --output=bzip2:/mnt/dumpsdata/xmldatadumps/public/labswiki/20241001/labswiki-20241001-pages-meta-history.xml.bz2.inprog --full
2024-10-13 23:06:34: labswiki *** exception! error producing xml file(s) pages-meta-history
2024-10-13 23:06:34: labswiki ['Traceback (most recent call last):\n', '  File "/srv/deployment/dumps/dumps-cache/revs/0d1f9be3610716a30b97df2ca671cc246c62c8f2/xmldumps-backup/dumps/runner.py", line 454, in do_run_item\n    item.dump(self)\n', '  File "/srv/deployment/dumps/dumps-cache/revs/0d1f9be3610716a30b97df2ca671cc246c62c8f2/xmldumps-backup/dumps/jobs.py", line 183, in dump\n    done = self.run(runner)\n', '  File "/srv/deployment/dumps/dumps-cache/revs/0d1f9be3610716a30b97df2ca671cc246c62c8f2/xmldumps-backup/dumps/xmlcontentjobs.py", line 916, in run\n    self.run_page_content_commands(commands, runner, \'regular\')\n', '  File "/srv/deployment/dumps/dumps-cache/revs/0d1f9be3610716a30b97df2ca671cc246c62c8f2/xmldumps-backup/dumps/xmlcontentjobs.py", line 743, in run_page_content_commands\n    raise BackupError("error producing xml file(s) %s" % self.get_dumpname())\n', 'dumps.exceptions.BackupError: error producing xml file(s) pages-meta-history\n']

I'm not sure what the cause of the error might be yet. I had a quick look into the MariaDB database permissions, but they seem OK.

I'll try running the commands manually, with a temporary output file and with reference to the dumpTestPass.php script here: https://www.mediawiki.org/wiki/Manual:DumpTextPass.

I can replicate the error by running this as the dumpsgen user.

dumpsgen@snapshot1013:~$ /usr/bin/php7.4 /srv/mediawiki/multiversion/MWScript.php dumpTextPass.php --wiki=labswiki --stub=gzip:/mnt/dumpsdata/xmldatadumps/public/labswiki/20241001/labswiki-20241001-stub-meta-history.xml.gz  --dbgroupdefault=dump --report=1000 --spawn=/usr/bin/php7.4 --output=bzip2:/var/lib/dumpsgen/labswiki-20241001-pages-meta-history.xml.bz2.inprog --full
Spawning database subprocess: '/usr/bin/php7.4' '/srv/mediawiki/php-1.43.0-wmf.26/../multiversion/MWScript.php' 'fetchText.php' '--wiki' 'labswiki'
getting/checking text tt:1 failed (Generic error while obtaining text for id tt:1) for revision 1
      0
      
       (Will retry 4 more times)
^C

It's interesting that if I look at the history of Main Page on Wikitech (https://wikitech.wikimedia.org/w/index.php?title=Main_Page&action=history&dir=prev), then the oldest revision I see there is 12270.

When I look at zless /mnt/dumpsdata/xmldatadumps/public/labswiki/20241001/labswiki-20241001-stub-meta-history.xml.gz then it say that revision 1 is the original import.

<title>Main Page</title>
<ns>0</ns>
<id>1</id>
<revision>
  <id>1</id>
  <timestamp>2011-06-03T18:44:08Z</timestamp>
  <contributor>
    <username>imported&gt;MediaWiki Default</username>
    <id>0</id>
  </contributor>
  <origin>1</origin>
  <model>wikitext</model>
  <format>text/x-wiki</format>
  <text bytes="438" sha1="4ekt8vw9ti0lmgilborq9q0ubtp396e" location="tt:1" id="1" />
  <sha1>4ekt8vw9ti0lmgilborq9q0ubtp396e</sha1>
</revision>

I'm not sure whether I'm correlating the correct IDs though.

The timestamps are strange, in that the first revision I can see for that page was made on 2004-06-10 and yet revision 1 from the stub-meta-history stub file was made on 2011-06-03.

image.png (840×1 px, 306 KB)

The timestamps are strange, in that the first revision I can see for that page was made on 2004-06-10 and yet revision 1 from the stub-meta-history stub file was made on 2011-06-03.

image.png (840×1 px, 306 KB)

If you check the revision ids, it's possible to import revisions (as shown by other changes on that screenshot) that will have an higher rev id, but a (much) older timestamp

If you check the revision ids, it's possible to import revisions (as shown by other changes on that screenshot) that will have an higher rev id, but a (much) older timestamp

OK, thanks for that explanation. Makes sense. Any idea why the fetchText.php would be failing for this wiki?

I have been trying this command:

/usr/bin/php7.4 /srv/mediawiki/multiversion/MWScript.php fetchText.php --wiki labswiki

...but nothing is shown.

That's probably mssing an argument of some sort to specify what text to fetch?

reedy@deploy1003:~$ /usr/bin/php7.4 /srv/mediawiki/multiversion/MWScript.php fetchText.php --wiki labswiki
Cannot run a MediaWiki script as a user in the group wikidev
Maintenance scripts should generally be run using sudo -u www-data which
is available to all wikidev users.  Running a maintenance script as a
privileged user risks compromise of the user account.

You should run this script as the www-data user:

 sudo -u www-data <command>
reedy@deploy1003:~$ mwscript fetchText.php --wiki labswiki
DEPRECATION WARNING: Maintenance scripts are moving to Kubernetes. See
https://wikitech.wikimedia.org/wiki/Maintenance_scripts for the new process.
Maintenance hosts will be going away; please submit feedback promptly if
maintenance scripts on Kubernetes don't work for you. (T341553)
^C
reedy@deploy1003:~$ mwscript fetchText.php --wiki labswiki --help
DEPRECATION WARNING: Maintenance scripts are moving to Kubernetes. See
https://wikitech.wikimedia.org/wiki/Maintenance_scripts for the new process.
Maintenance hosts will be going away; please submit feedback promptly if
maintenance scripts on Kubernetes don't work for you. (T341553)

Fetch the raw revision blob from a blob address.
Integer IDs are interpreted as referring to text.old_id for backwards
compatibility.
NOTE: Export transformations are NOT applied. This is left to dumpTextPass.php

Usage: php /srv/mediawiki-staging/php-1.43.0-wmf.26/maintenance/run.php fetchText.php [OPTION]...

Script runner options:
    --conf <CONF>: Location of LocalSettings.php, if not default
    --globals: Output globals at the end of processing for debugging
    --help (-h): Display this help message
    --memory-limit <MEMORY-LIMIT>: Set a specific memory limit for the
        script, "max" for no limit or "default" to avoid changing it
    --profiler <PROFILER>: Profiler output format (usually "text")
    --quiet (-q): Whether to suppress non-error output
    --server <SERVER>: The protocol and server name to use in URLs, e.g.
        https://en.wikipedia.org. This is sometimes necessary because server
        name detection may fail in command line scripts.
    --wiki <WIKI>: For specifying the wiki ID

Common options:
    --dbgroupdefault <DBGROUPDEFAULT>: The default DB group to use.
    --dbpass <DBPASS>: The password to use for this script
    --dbuser <DBUSER>: The DB user to use for this script
Gehel triaged this task as High priority.Nov 8 2024, 2:22 PM

Noting here that for the 20241101 run, the full dump (aka pages-meta-history) was successful: https://dumps.wikimedia.org/labswiki/20241101/

So the issue from this ticket seems to be a repro of T377594: Fix Dumps - errors exporting good revisions.

One thing I do notice that can be problematic mid term is that the full dump was done in the one bz2 file of 15.4GB:

2024-11-09 14:18:31 done All pages with complete page edit history (.bz2)
labswiki-20241101-pages-meta-history.xml.bz2 15.4 GB

So this wiki cannot be categorized as a 'small' wiki, IIRC there is a procedure to categorize it differently so that the dump is split in smaller files.

Change #1090832 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Revert "Remove labswiki from HDFS imported dumps"

https://gerrit.wikimedia.org/r/1090832

Change #1090834 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] [dumps] - Categorise labswiki (wikitech) as a big wiki

https://gerrit.wikimedia.org/r/1090834

One thing I do notice that can be problematic mid term is that the full dump was done in the one bz2 file of 15.4GB:

2024-11-09 14:18:31 done All pages with complete page edit history (.bz2)
labswiki-20241101-pages-meta-history.xml.bz2 15.4 GB

So this wiki cannot be categorized as a 'small' wiki, IIRC there is a procedure to categorize it differently so that the dump is split in smaller files.

I think that this patch will achieve that: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1090834

Change #1090834 merged by Btullis:

[operations/puppet@production] [dumps] - Categorise labswiki (wikitech) as a big wiki

https://gerrit.wikimedia.org/r/1090834

Change #1090832 merged by Btullis:

[operations/puppet@production] Revert "Remove labswiki from HDFS imported dumps"

https://gerrit.wikimedia.org/r/1090832

Resolving this task now. I'll in mid-December whether the wikitech dump has:

  • Dumped correctly
  • Dumped as a big wiki
  • Been imported into HDFS after dumping