Page MenuHomePhabricator

Maintenance script to test already uploaded files
Open, Needs TriagePublic

Description

[21:41:22] <godog> FWIW I agree it'd be useful to have "grep this string over every n-th uploaded file"

It'd be really useful if there was a way to iterate over all files (or a sampled set, with various filters) on a wiki, and pass them to an extra script for processing

Useful for things like T48921 and T12847 to find bad files that exist already

Event Timeline

We can use $backend->getFileList() and iterate over them all, though this probably seems sub optimal....

It'd be easier to make a list based on the images table (easier to get at filter criteria), and then get those specifically...

This could be also jobqueue-type of workload, the queue of work to do contains a batch of files and the script to run on them and save results somewhere

Meaningful sampling seems like the hardest part of this (unless you are fine with iterating over all the files and skipping most of them).