Page MenuHomePhabricator

[Investigation] Missing revertrisk scores in dumps - Time box 1 week
Closed, ResolvedPublicSpike

Description

According to a dump reuser, about half of the articles/new revisions in the last monthly dump were missing revertrisk scores.

We believe that this may be expected, as many articles present in the dump may not have been edited/revised since revertrisk has been live. There is also a potential that revertrisk is time-ing out and delivering nulls, or that event mismatches are causing this.

As the product manager for content integrity, I want to confirm what percentage of no-revertrisk are due to reasons NOT related to post-bulk-ingestion, what are those reasons.

Acceptance criteria

  • You have reviewed failure logs in our register
  • Summarize log failures in as many of the top 5 languages as possible - start with english.
    • Use the following criteria to find a statistically significant subset of articles--
    • Split revisions by length in 4 even quartiles.
    • Create significance by analyzing 10 in each quartile
      • *NOTE* if quartile analysis is too hard or not possible, let me know asap.
  • A document with the findings is shared

Event Timeline

JArguello-WMF renamed this task from [Spike] Missing revertrisk scores in dumps to [Investigation] Missing revertrisk scores in dumps - Francisco to scope down the ticket.Sep 26 2024, 1:53 PM
JArguello-WMF added a project: Spike.
JArguello-WMF updated the task description. (Show Details)
Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptSep 26 2024, 1:53 PM
JArguello-WMF renamed this task from [Investigation] Missing revertrisk scores in dumps - Francisco to scope down the ticket to [Investigation] Missing revertrisk scores in dumps - Time box 1 week.Sep 26 2024, 1:59 PM
JArguello-WMF updated the task description. (Show Details)