We already have this functionality (not as a maintenance script) but as a job spec/job[1] for WMF wikis. Third-parties will need a mechanism to warm their parser cache with parsoid output because we're now making several extensions and core to begin using parsoid outputs for views, edits etc.
In order for third-party wikis to not feel a performance degradation when they begin using the new backend, we should provide a maintenance script for them to run as a first step to prepare their caches with appropriate parser outputs from parsoid so when they switch to using the new backend, performance will stay the same as before (with the legacy output).
Ideally, the script should go through pages whose content model is supported by parsoid on the set wiki progressively and parse pages, save the output in ParserCache (the backend for PC can be configurable with https://gerrit.wikimedia.org/g/mediawiki/core/+/ab1a809acc6633fd7ebd2027688d51c4813754d1/docs/config-schema.yaml#2465).
Due to relatively large sizes of wikis, the script should operate on the pages in batches (say 100 per batch) in order not to attempt doing such an operation for millions of pages at once.
Options/Flags
- --force - force parse even if there is an entry in PC
- --namespace X - parse pages in a given namespace. Example: --namespace MediaWiki
- --start-from X - the page ID to start the parse from
- More TBA.
[1] https://gerrit.wikimedia.org/r/c/mediawiki/core/+/806443