Page MenuHomePhabricator

Investigate database/job queue impact of account imports
Closed, ResolvedPublic

Description

Watching the import aftermath on the job queue over the past week or so via the showJobs.php maintenance script, an import of an account with very few (less than five) edits/actions in the database generates over 13,000 reattributeImportedEdits jobs, which seems completely disproportionate, and is probably the cause of observed/reported difficulties with importing accounts for Yugipedia.

Event Timeline

Dinoguy1000 renamed this task from Investigate database/job queue impact on account imports to Investigate database/job queue impact of account imports.Jul 23 2019, 11:12 PM

What version of MediaWiki are you importing into (target wiki where MWA is installed)?

If 1.31 or later, what is the value of $wgActorTableSchemaMigrationStage on your wiki?

@Skizzerz We're on 1.31.1, and $wgActorTableSchemaMigrationStage is commented out (and has been for months - we originally tried setting it to MIGRATION_WRITE_BOTH, but had to disable it almost immediately because it wasn't playing nice with our millions of imported edits).

It looks like the current behavior of the extension is to generate jobs for each block of 300 edits in the wiki, and then try to reattribute later within each block. So, if your wiki has 300,000 edits, each import would generate 1,000 jobs. This means that the majority of the jobs will actually be doing absolutely nothing, and it just serves to bloat the job queue.

There is certainly a better way of handling this, although I can't give any ETA of when I'll be able to get around to fixing this issue (as well as adding support for the actor migration). I have a couple other projects in front of this one.

Aah, so with 4.125 million edits, it results in 13,750 jobs being generated. This definitely lines up with what I've been seeing.

Change 559286 merged by jenkins-bot:
[mediawiki/extensions/MediaWikiAuth@master] Make edit attribution actor-aware

https://gerrit.wikimedia.org/r/559286

Skizzerz claimed this task.

We now generate jobs proportional to the number of edits that need to be reattributed instead of total number of edits, which should reduce database impact by a lot. Some extra work is needed at import time to do this, however.