We have some standards for how we define ORES thresholds, assuming the models perform ideally. When models aren't so good, which might be the case for new languages, or may even be due to a regression in an existing model, we need a strategy for how to degrade the threshold levels used on-wiki. @Catrope currently uses https://jsfiddle.net/catrope/50n1ekgu/ to generate https://docs.google.com/spreadsheets/d/1c8NMubBO0AS5KOFt5og0gz3dJK0IjEB3FwYOWEOXQ2g/edit?usp=sharing , then applies savage eyeball metrics to decide whether models are satisfactory, and how to fudge the levels if they are not. For example, we might only have one level "likely damaging" and no others, since the granularity isn't good enough to warrant extra levels.
It would be nice to formalize these rules so that we can one day automate the process or at least pass it on to additional humans.