Page MenuHomePhabricator

Support output files in mwscript-k8s
Open, LowPublic

Description

Some maintenance scripts assume that outputs can be written to the local file system for later use (e.g., by other scripts). This of course does not work today, since although API objects associated with a completed pod remain (as do logs), the overlay file system does not.

While used relatively rarely, the scripts to generate title-case mapping overrides are one such example. In T372603 (mapping for the 7.4 to 8.1 migration), we used kind of a hack where the contents of generateUpperCharTable.php were run inline in the shell.php REPL and written to /tmp, then while the REPL was still running, the results kubectl cp'd out.

It would be nice if this could be done a bit less awkwardly, even if that simply means adding an (optional) pause somewhere (to avoid the shell.php trick and allow direct invocation of the desired script), and displaying example commands for copying out files.

I don't know how widely something like this is needed, but from a naive spot check for file_put_contents use in maintenance/, it seems like other use cases likely exist.

Event Timeline

Triaging this as low-priority initially, since no other critical use cases for this functionality have surfaced yet, and the title-case mapping use case is rare / has a viable workaround.

Hey. I wanted to point out that this is a feature that I actively use currently with migrateESRefToContentTable.php and MigrateESRefToAflTable.php.

The two use cases mainly consist of making a dump of data that the script deletes so that it can be restored more easily in the event of corruption. On the other hand, I save data about rows that are to be further "worked on" (e.g. being deleted) later.

The output files of these scripts can easily be up to 100GB.

There are other scripts, not by me, like moveToExternal.php which also did this kind of disk saving (the actual writing happens in the UndoLog class in this case) in the past.

The workflow to update the interwiki cache also relies on saving the output of a maintenance script to a file.