The current state of upload tools is that we at Multimedia do not know that something is horribly broken until someone nicely points it out to us. This is terrible. We should be alerted about things like no cross-wiki uploads coming through (T132612) or a new error message popping up in large numbers (T130238).
Description
Related Objects
Event Timeline
I believe we've spoken about this at some length - namely, I pointed out that it's entirely possible (if not likely) that uploads from a certain tool might stop for totally innocent reasons - like on-wiki admins directing people to another tool, or whatever. But, I like the principle, as long as the thresholds are appropriate and the notifications aren't too blaring or constant.
Perhaps we could have an IRC bot, for example, that checks every 30 minutes (or less often?) for the state of uploads on the cluster. If it's, let's say, two standard deviations below the mean for the past hour, then raise a warning in the -multimedia channel. I don't think filing a Phabricator task would be useful or appropriate. But we could do this for different tools fairly easily (CWUs are tagged, UploadWizard has a particular edit summary), and we could compare those results to the total uploads to determine if the entire upload system is down, or if it's just one tool, or a couple of tools that use the same library.
I'm prioritising as "high" because I'd like to see this happen, but I think this will naturally sink to "normal" if we don't get to it in the next month or so. I believe it might be useful to do this at the dev summit.
@matmarex: Do you know if there is any existing team or project tag which still cares about this task, or is this task just invalid these days?