Page MenuHomePhabricator
Paste P10310

Get ORES topic thresholds.
ActivePublic

Authored by Halfak on Feb 4 2020, 9:16 PM.
Referenced Files
F31547870: raw.txt
Feb 4 2020, 9:16 PM
Subscribers
None
Let's get some useful thresholds for models. Generally, these thresholds are going to look a lot worse than they really are -- mostly because of labels we used to train are messy and incomplete. We're targeting at least 70% precision, but we're likely to get that when we ask for 50% precision -- and in some cases, we'll still get it when we target even lower precision.
So! We're going to use ORES "threshold optimization" querying system. We'll need to make a call for each topic in order to get an appropriate threshold:
* Culture.Biography.Biography* [[maximum recall @ precision >= 0.5](https://ores.wikimedia.org/v3/scores/enwiki/?models=articletopic&model_info=statistics.thresholds.%22Culture.Biography.Biography*%22.%22maximum%20recall%20@%20precision%20%3E=%200.5%22)]
```
{
"!f1": 0.925,
"!precision": 0.996,
"!recall": 0.863,
"accuracy": 0.877,
"f1": 0.662,
"filter_rate": 0.759,
"fpr": 0.137,
"match_rate": 0.241,
"precision": 0.5,
"recall": 0.977,
"threshold": 0.086
}
```
* Culture.Biography.Women [[maximum recall @ precision >= 0.5](https://ores.wikimedia.org/v3/scores/enwiki/?models=articletopic&model_info=statistics.thresholds.%22Culture.Biography.Women%22.%22maximum%20recall%20@%20precision%20%3E=%200.5%22)]
```
{
"!f1": 0.993,
"!precision": 0.995,
"!recall": 0.99,
"accuracy": 0.985,
"f1": 0.572,
"filter_rate": 0.981,
"fpr": 0.01,
"match_rate": 0.019,
"precision": 0.501,
"recall": 0.668,
"threshold": 0.667
}
```
* Culture.Media.Entertainment [[maximum recall @ precision >= 0.5](https://ores.wikimedia.org/v3/scores/enwiki/?models=articletopic&model_info=statistics.thresholds.%22Culture.Media.Entertainment%22.%22maximum%20recall%20@%20precision%20%3E=%200.5%22)]
```
{
"!f1": 0.998,
"!precision": 0.998,
"!recall": 0.998,
"accuracy": 0.996,
"f1": 0.47,
"filter_rate": 0.997,
"fpr": 0.002,
"match_rate": 0.003,
"precision": 0.503,
"recall": 0.442,
"threshold": 0.646
}
```
* STEM.Mathematics [maximum recall @ precision >= 0.3](https://ores.wikimedia.org/v3/scores/enwiki/?models=articletopic&model_info=statistics.thresholds.%22STEM.Mathematics%22.%22maximum%20recall%20@%20precision%20%3E=%200.3%22)]
```
{
"!f1": 1.0,
"!precision": 1.0,
"!recall": 0.999,
"accuracy": 0.999,
"f1": 0.401,
"filter_rate": 0.999,
"fpr": 0.001,
"match_rate": 0.001,
"precision": 0.309,
"recall": 0.571,
"threshold": 0.903
}
```
Here, we can see some diversity. Culture.Biography.Biography* is easy to model and it's very common in the labeled data, so we can get very high precision and very high recall and a strict threshold. STEM.Mathematics is on the other end of the spectrum. There are very few math-related articles at all. I've relaxed the minimum precision to 0.3 in order to get a threshold.