Page MenuHomePhabricator

mwdumper should use bulk load optimizations
Closed, DeclinedPublicFeature

Description

We can improve MySQL load times by using SET autocommit = 0 and disabling key checks. It might be worth benchmarking LOAD DATA for our use case. I thought I remembered a mysql-fast-import utility, but nothing immediately jumps out on the Internet.

PostgreSQL can bulk load using the COPY command, reading from a CSV file or CSV-formatted rows inlined in the SQL file.

It would be awesome if we could make our CSVs compatible between both mysql and pgsql, and bundle static SQL scripts for each backend, but I'm not sure off-hand whether that's possible.

Event Timeline

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 6 2022, 5:56 PM
hashar subscribed.

mwdumper is no more able to process dump generated since MediaWiki 1.31 (released in June 2018). The tool started in 2005 and is no more maintained, it is thus being archived, see T351228 for reference.