T383962 shows that Special:Import is not fit for purpose. It's not feasible to import thousands of revisions from Incubator in a single POST request. It's not safe to retry the request after it fails. The current situation has left us with an epic mess in the database of new wikis.
The solutions are to either use the job queue or to segment the file on the client side and to import it in chunks. The job queue solution has some scalability and performance challenges. The client-side solution is apparently scalable.
To implement a purely job queue solution, we would need to accept a potentially large file in a single POST request, store it before the request times out, and then load the data into numerous jobs. The jobs could either download the whole file and filter it, or the initial POST request could segment the file and store it separately for each job. Segmenting and storing the file in a single request is comparable to just importing it, so the asynchronous advantage is limited.
Client-side JS can read a file selected by the user with Blob.stream(), parse it, segment it, and post it in chunks to ApiImport. Importing a multi-gigabyte file would be feasible with this approach.
If the user's browser exits part-way through the import, we would want some way to recover, say by saving the current state to IndexedDB when each request is sent.
Components:
- New form
- Progress and message display UI
- Resumable session
- Upload controller
- Stream segmenter
- Limit revision count and file size in the legacy no-JS implementation