According to a dump reuser, about half of the articles/new revisions in the last monthly dump were missing revertrisk scores.
We believe that this may be expected, as many articles present in the dump may not have been edited/revised since revertrisk has been live. There is also a potential that revertrisk is time-ing out and delivering nulls, or that event mismatches are causing this.
As the product manager for content integrity, I want to confirm what percentage of no-revertrisk are due to reasons NOT related to post-bulk-ingestion, what are those reasons.
Acceptance criteria
- You have reviewed failure logs in our register
- Summarize log failures in as many of the top 5 languages as possible - start with english.
- Use the following criteria to find a statistically significant subset of articles--
- Split revisions by length in 4 even quartiles.
- Create significance by analyzing 10 in each quartile
- *NOTE* if quartile analysis is too hard or not possible, let me know asap.
- A document with the findings is shared