Page MenuHomePhabricator

rdfDump.php generates error messages when dumping for pages without mediainfo items
Closed, ResolvedPublic

Description

For pages (in namespace 6) with no mediainfo slot, an error message is logged of the form:

[failed-to-dump]: Failed to dump M70620 (Entity not found: M70620)

Given that most pages on commons don't yet have mediainfo slots, this is going to be 50 million log entries. Can we silence the specific entry somehow, maybe via a command line flag?

Sample command run (as the dumpsgen user from snapshot01 in beta):

php /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki --shard 1 --sharding-factor 2 --batch-size 70000 --format nt--flavor full-dump --entity-type mediainfo --no-cache --dbgroupdefault dump --part-id '1-2' 1 70000  2>> /var/lib/dumpsgen/mediainfo-dumps-log-ttl.txt | gzip > /mnt/dumpsdata/xmldatadumps/temp/mediainfo-dumps-test-nt.gz

Event Timeline

Out of curiosity, what errors wind up in logstash from this script, if any?

AFAICS error msgs are written to php://stderr by default which I would expect to end up in logstash. There's a --log option in dumpRdf.php which allows you to specify where error msgs get written to, so I think you can probably send them to a file instead of logstash. Would that work for you? Probably you ought to test this first with --limit (I don't have logstash locally so I can't be sure I'm right)

It would work to get a test dump out, and yeah I'll do a little test first. But for production I'd like to be able to not write them at all, no point to it.

Aha! I found another flag --ignore-missing which I think does exactly what you need

$this->addOption(
			'ignore-missing',
			'Ignore missing IDs, do not report errors on them',
			false,
			false
		);

Brilliant! I'll be doing some fun things tomorrow then. Thanks!

Ah yes it is! The flag does all it needs to, sorry about that.

Cparle claimed this task.

Great :)