Page MenuHomePhabricator

[XL] Estimate coverage of image suggestions at different confidence levels
Closed, ResolvedPublic

Description

We want to get estimates of how many total unillustrated articles on each of the relevant wikis will have an image recommended by the new pipeline, for different levels of likelihood-that-an-image-is-good in the recommendation. This is necessary for us to make a decision about which confidence score cutoff to use in making the recommendations. In general, we want the highest confidence score possible, but if there aren't enough recommendations at a high score, we will consider using a lower score.

The wikis are:
pt
ru
id

The likelihood-that-an-image-is-good levels we want to measure are 0.9, 0.8, 0.7

Acceptance criteria:

  • Document the number of suggestions for unillustrated articles in the above wikis at the 0.9 confidence level
  • Work with product management to evaluate whether that number is sufficient
  • If not, measure again at the 0.8 level, etc.

Event Timeline

CBogen renamed this task from Estimate mediasearch coverage of image recommendations at different confidence levels to [XL] Estimate mediasearch coverage of image recommendations at different confidence levels.Jun 30 2021, 4:40 PM

We're probably not going to be suggesting images only for unillustrated articles (at least for structured data), and because our target for images-added has changed, this information is probably no longer very useful for us

CBogen updated the task description. (Show Details)
CBogen updated the task description. (Show Details)
CBogen renamed this task from [XL] Estimate mediasearch coverage of image recommendations at different confidence levels to [XL] Estimate coverage of image recommendations at different confidence levels.Mar 14 2022, 4:21 PM
CBogen updated the task description. (Show Details)
CBogen renamed this task from [XL] Estimate coverage of image recommendations at different confidence levels to [XL] Estimate coverage of image suggestions at different confidence levels.Mar 14 2022, 4:24 PM
CBogen updated the task description. (Show Details)

We'll need at least a preliminary dataset from to do this work

Moved this into blocked - it should be quite easy to do once we have T299789 done, so there's no point in wasting effort doing it before then

Confidence >= 90%

wikipages_with_suggestions
ptwiki7274
idwiki4589
ruwiki2562

Confidence >= 80%

wikipages_with_suggestions
ptwiki126607
idwiki64413
ruwiki101126

Confidence >= 70%

wikipages_with_suggestions
ptwiki129440
idwiki66243
ruwiki104690

Ok to resolve this @CBogen ?

@Cparle which confidence level are we using in the current iteration of the data pipeline?

also just tagging @SWakiyama so she's aware.

Once you answer this, we can close the ticket, thanks!

@Cparle which confidence level are we using in the current iteration of the data pipeline?

We're writing suggestions at each confidence level (0.7, 0.8, 0.9), and leaving the decision on which to use up to the client. Does that answer your question?

CBogen claimed this task.

It does, thanks! I think that then @SWakiyama needs to make a call based on the information gathered for this ticket which confidence level to use in T292147 (AC #3). I'll follow up with her offline and will close this ticket.

Thanks Cormac! We'll use >= 80% when making suggestions.