makeMailingList.php and deduplicateMailingList.php make some presuppositions that aren't necessarily true of Kubernetes, the more pressing of which is that writing to files and then reading from those files is not currently supported. These files may need to be rewritten to accommodate that restriction (assuming the restriction isn't lifted).
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | Feature | tstarling | T401871 Rewrite `makeMailingList.php` and `deduplicateMailingList.php` to work with Kubernetes restrictions | ||
| Resolved | jrbs | T404060 makeMailingList.php creates 30GB of data |
Event Timeline
In past years, these scripts have been critical to the successful completion of the Board of Trustees selection process, as there is a requirement to email eligible voters with information about the vote.
We need to run this script sometime in September or very early October, in order to be able to email the voters by early-mid October 2025.
There are two paths forward for this in my eyes:
- Someone with PHP skills should help rewrite the script to no longer create a bunch of files and instead do both of these stages in memory somehow, if that's even possible.
- Kubernetes needs to allow us to create and refer to files (writing them to home dir rather than data perhaps?). This is probably much harder and longer-term project.
We don't have a ton of time to figure this out so I would think route 1 is the ideal way to go.
Can you clarify the process? Is one of these scripts using the output file from the other one as input in two different stages, or are the scripts using the output as temporary files in the same run? The first is not possible without workarounds, the second is.
- Kubernetes needs to allow us to create and refer to files (writing them to home dir rather than data perhaps?). This is probably much harder and longer-term project.
This would require persistent state storage in wikikube if the aim is to access these files from different mw-script-k8s runs without manual steps, which we don't plan on supporting right now.
We don't have a ton of time to figure this out so I would think route 1 is the ideal way to go.
It would be ideal, however there are ways to get stdout or a file written by the script out of the container [1] as well as passing files [2] or stdin [3] to a script.
[1] https://wikitech.wikimedia.org/wiki/Maintenance_scripts#Output_to_a_file
[2] https://wikitech.wikimedia.org/wiki/Maintenance_scripts#Input_from_a_file
[3] https://wikitech.wikimedia.org/wiki/Maintenance_scripts#Input_on_stdin
The docs are here: https://wikitech.wikimedia.org/wiki/SecurePoll#Email_spam
- makeMailingList.php is run on all wikis. This outputs ml-$wiki files for each wiki (i.e. ml-enwiki)
- deduplicateMailingList.php is run against only those generated ml- files and dedupes them into a new file i.e. dedup-all
- sendMail.php is then used to actually send the email to the emails in dedup-all. That step I think is fine with k8s.
We don't have a ton of time to figure this out so I would think route 1 is the ideal way to go.
It would be ideal, however there are ways to get stdout or a file written by the script out of the container [1] as well as passing files [2] or stdin [3] to a script.
[1] https://wikitech.wikimedia.org/wiki/Maintenance_scripts#Output_to_a_file
[2] https://wikitech.wikimedia.org/wiki/Maintenance_scripts#Input_from_a_file
[3] https://wikitech.wikimedia.org/wiki/Maintenance_scripts#Input_on_stdin
I could investigate this but my PHP skills are pretty poor. We might be able to coerce an engineer to help us though, that would be on us to sort out though!
makeMailingList.php already writes to stdout by default and the documented procedure already uses a shell redirect to write it to a file, so that's fine already.
deduplicateMailingList.php already reads from stdin and writes to stdout by default. Currently the documented procedure is to pass multiple files on the command line, but the script just reads them line by line sequentially, so they can be piped in with cat instead. You won't get informative line numbers if there's an error, but there won't be an error.
sendMail.php unconditionally takes the name of a file from its arguments. Either mwscript-k8s --file could be used, or the name of the file can be php://stdin or /dev/stdin.
So I think the only thing to do here is to update the documentation. I updated it with a procedure that I think will work.