The migration path should be incremental so that work can be paced, risk is low, and service isn't disrupted. Also, old dumps can only be phased out once we have proven that the replacement is reliable, and it would be expensive to run redundant dumps.
Here's one straw person proposal for how to migrate:
- Pick a pilot dump format.
- Write a module which can process an arbitrary chunk of that dump's work.
- Write a lightweight integration for celery which can spawn these pilot jobs and consolidate their output.
- Milestone: Produces a completed dump for some wiki, in some format.
- Optimize filesystem operations.
- Milestone: Completes large wiki dump in reasonable time. Feasibility is demonstrated.
- Create a shareable development environment and write documentation.
- Put some love into the pluggable architecture.
- Write a second and third dump format plugin.
- Milestone: Regular jobs run for multiple formats. We can begin to deprecate Dumps 1.0 now.