Page MenuHomePhabricator

import-dump command fails due to sudo broken file permissions
Closed, ResolvedPublic

Description

Use case: import a Wikidata dump into a MediaWiki Vagrant instance running on a Cloud-VPS machine.

TL;DR: is there a workaround or an alternative import method to avoid doing this manually? :-)


Details

cd /srv/mediawiki-vagrant
vagrant import-dump wikidatawiki-latest-pages-articles19.xml-p19072452p19140743.bz2

returns

sudo: /usr/bin/sudo must be owned by uid 0 and have the setuid bit set
sudo: /usr/bin/sudo must be owned by uid 0 and have the setuid bit set

Changing the /usr/bin/sudo bits in the Vagrant box, as suggested in:
https://stackoverflow.com/questions/16682297/getting-message-sudo-must-be-setuid-root-but-sudo-is-already-owned-by-root/19306929#19306929
https://askubuntu.com/questions/452860/usr-bin-sudo-must-be-owned-by-uid-0-and-have-the-setuid-bit-set#471503

requires root privileges, of course, and I'm not sure it's a good workaround. Anyway, the default root password is not vagrant, as mentioned in:
https://www.mediawiki.org/wiki/MediaWiki-Vagrant#Basic_usage

Here is some digging that may be useful.

  • the actual BASH script run in the Vagrant box is /srv/mediawiki-vagrant/puppet/modules/mediawiki/files/import-mediawiki-dump;
  • the script calls mwscript:
/usr/local/bin/mwscript importDump.php --uploads -- "$FILE"
/usr/local/bin/mwscript rebuildrecentchanges.php
  • mwscript is the one calling sudo, see lines 4 and 5:
# Ensure that the script is run as the www-data user
[[ $(whoami) = www-data ]] || exec sudo --preserve-env -u www-data -n -- "$0" "$@"
  • directly calling /var/www/w/MWScript.php from inside the Vagrant box also fails:
vagrant ssh
php /var/www/w/MWScript.php importDump.php --uploads /vagrant/wikidatawiki-latest-pages-articles19.xml-p19072452p19140743.bz2

returns

Cannot run a MediaWiki script as a user in the group vagrant
Maintenance scripts should generally be run using sudo -u www-data which
is available to all wikidev users.  Running a maintenance script as a
privileged user risks compromise of the user account.

You should run this script as the www-data user:

 sudo -u www-data -- <command>

Related to T76041.

Event Timeline

sudo: /usr/bin/sudo must be owned by uid 0 and have the setuid bit set should never happen. This is pretty obviously a corrupted LXC container. If you run vagrant destroy -f && vagrant up && vagrant import-dump ... can you recreate this failure?

@bd808 , thanks for the comment. I tried the commands you suggested, now I'm getting a nicer:

bash: import-mediawiki-dump: command not found

On a local test server:

$ cd $MY_MWVAGRANT_CHECKOUT
$ ls -l Wikitech-20180111155417.xml
-rw-r--r--@ 1 bd808  staff   6.8K Jan 11 08:54 Wikitech-20180111155417.xml
$ vagrant import-dump Wikitech-20180111155417.xml
Done!
You might want to run rebuildrecentchanges.php to regenerate RecentChanges,
and initSiteStats.php to update page and revision counts
Rebuilding $wgRCMaxAge=7776000 seconds (90 days)
Clearing recentchanges table for time range...
Loading from page and revision tables...
Inserting from page and revision tables...
Updating links and size differences...
Loading from user, page, and logging tables...
Flagging bot account edits...
Flagging auto-patrolled edits...
Removing duplicate revision and logging entries...
Deleting feed timestamps.
Done.

One thing I just realized is that vagrant import-dump does not currently handle additional arguments like --wiki=wikidatawiki which would be needed to import the dump to a wiki other than the default (dbname "wiki"). You may be better off using mwscript and importDump.php manually inside the VM with something like:

$ vagrant ssh
$ cd /vagrant
$ mwscript importDump.php --wiki=wikidatawiki --uploads $PATH_TO_DUMP

Yeah, I was wondering the same thing. Will try out your suggestions. Thanks again!

mwscript now seems to run inside the vagrant box:

mwscript importDump.php --wiki=wikidatawiki --uploads wikidatawiki-20171220-pages-articles19.xml-p19072452p19140743.bz2

but complains:

PHP Warning:  fopen(compress.bzip2://wikidatawiki-20171220-pages-articles19.xml-p19072452p19140743.bz2): failed to open stream: operation failed in /vagrant/mediawiki/maintenance/importDump.php on line 276
PHP Stack trace:
PHP   1. {main}() /var/www/w/MWScript.php:0
PHP   2. require_once() /var/www/w/MWScript.php:95
PHP   3. require_once() /vagrant/mediawiki/maintenance/importDump.php:350
PHP   4. BackupReader->execute() /vagrant/mediawiki/maintenance/doMaintenance.php:94
PHP   5. BackupReader->importFromFile() /vagrant/mediawiki/maintenance/importDump.php:114
PHP   6. fopen() /vagrant/mediawiki/maintenance/importDump.php:276

Note that:

$ ls -l /srv/mediawiki-vagrant/
-rw-rw-r--  1 mwvagrant wikidev  29M Jan 11 16:29 wikidatawiki-20171220-pages-articles19.xml-p19072452p19140743.bz2

Likely one of:

  • the file is not readable in the VM for some reason (NFS/Vbox failure?)
  • php in the VM does not have bzip support (configuration problem?)
  • the archive is corrupt

The next workaround I would try is decompressing the dump before importing.

Hjfocs claimed this task.

Nice, making importDump.php read from standard input seem to work:

bzcat wikidatawiki-20180101-pages-articles19.xml-p19072452p19140743.bz2 | mwscript importDump.php --wiki=wikidatawiki --uploads --debug --report 10000

although lots of errors like the one below are thrown:

Revision 32073113 using content model wikitext cannot be stored on "Translations:Wikidata:Glossary/23/sr" on this wiki, since that model is not supported on that page.

Now trying to import the whole Wikidata dump: as a side note, the script got stuck for almost 2 days on wikidatawiki-20180101-pages-articles27.xml-p37586178p39086178.bz2.
Killed the import of that file, import resumed on another one.
Thanks once more @bd808 for your precious help.