O4 KR 1
As a PM, I'd like to know our coverage and accuray in the below languages of abstracts, infoboxes and sections.
*TO DO*User Story: “As a PM, I to finish the Data Coverage document where the scope, methods and outcomes of coverage are defined for MR, so that we have a documented understanding of what has been done and can gather insights on how we could build up a larger testing framework.”
**Acceptance criteria**
[] Data coverage outputs documented for
[ ] use Abstract random test (see code) in the below languages on 10K sample set and 1K statistical sample set, outcome csv with error [ ] Abstracts
[ ] use Infobox manual check (see docs) in below languages on 10K sample set and 1K statistical sample set, outcome in doc [ ] Sections
[ ] use Sections manual check (see docs) in below languages on 10K sample set and 1K statistical sample set, outcome in doc [ ] Infoboxes
*Languages*The document must have the following sections:
[] A Description of how we get our sample sets, where we got them from (What dataset was used to determine the outcomes?)
[] A Description of the tools and what was used to test
[] An explanation of what the confusion matrix sees as FN, FP, TN, TP and what the caveats are
[] A confusion matrix for Infoboxes, Sections and Abstracts, including brief analysis to accomodate the numbers
[] Document must have all pending comments in Data Coverage document made until Feb 28th marked as resolved
===== Things to consider: =====
1. If devs find along the way unknowns, or outstanding questions on how the task should be tackled, they must log them in the decision log and consult with the team the best path to move forward before implementing any changes. Product has agreed to reduce velocity until outstanding questions and unknowns are resolved.
2. This is a parent task and will be broken down in smaller chunks of work or subtasks
//English//, //Spanish//, //French//, //German//, //Italian//, //Portuguese//, Hindi, Korean, //Japanese//, Arabic, Indonesian, //Dutch//, Swedish, Vietnamese, Malay, //Polish//, Turkish,3. MandarinRedoing past work is a risk that product has accepted (The risk cannot be resolved, Teluguso it must be accepted as-is and Tamildealt with as necessary)
* languages in italics have been partially checked before4. Log findings, so that they can be triaged either as known issues or as needed fixes (which will need phab tickets and be prioritized)