Let's get some useful thresholds for models. Generally, these thresholds are going to look a lot worse than they really are -- mostly because of labels we used to train are messy and incomplete. We're targeting at least 70% precision, but we're likely to get that when we ask for 50% precision -- and in some cases, we'll still get it when we target even lower precision.
So! We're going to use ORES "threshold optimization" querying system. We'll need to make a call for each topic in order to get an appropriate threshold:
Here, we can see some diversity. Culture.Biography.Biography* is easy to model and it's very common in the labeled data, so we can get very high precision and very high recall and a strict threshold. STEM.Mathematics is on the other end of the spectrum. There are very few math-related articles at all. I've relaxed the minimum precision to 0.3 in order to get a threshold.