We should only dump up to N entities in each maintenance script run, and then start a new dumper instance at that offset.
This has several benefits:
- If a script fails, we just need to redo the last N entity batch and not the whole thing.
- We can (with some grace time) nicely react in case a DB etc. goes down/ changes (and even if no grace time is given, 1. helps here)
- All shards will be equally fast (because they will switch DB replicas/ external storage replicas throughout, so picking a slower one at some point doesn't have as much effect)
- Memory leaks and other long-running PHP with MediaWiki things don't bite us as hard
- …
I suggest to pick N so that a dumper runs for about 15-30m, before exiting and handing over to the next runner.
One problem that needs solving, or we at least need to be aware: If a new Wikibase version gets deployed mid-dump, the serialization format might not be consistent within a single dump.