Page MenuHomePhabricator

{Machine Readability} Coverage metrics of Machine Readability features (abstracts, infobox, sections, lists)
Open, Needs TriagePublic13 Estimated Story Points

Description

O4 KR 1

As a PM, I'd like to know our coverage in the below languages of abstracts, infoboxes and sections.

User Story: “As a PM, I want to finish the Data Coverage document where the scope, methods and outcomes of coverage are defined and agreed on for MR, so that we have a shared understanding of what has been done, a way to gather insights and make informed steps on how to build up a larger testing and confidence framework.”

Acceptance criteria

  • Data coverage outputs documented for the below three features
    • Abstracts
    • Sections
    • Infoboxes

For the languages: English, Spanish, French, German, Italian, Portuguese

The document must have the following sections:

  • A Description of how we get our sample sets, where we got them from (What dataset was used to determine the outcomes?)
  • A Description what was used to test against (ground truth)
  • An explanation of what FN, FP, TN, TP means for each features with examples to reach team understanding and agreement that confusion matrix works for all features
  • Run and document the agreed method on the test sample for Infoboxes, Sections and Abstracts,
  • Document analysis of to explain the numbers or know caveats (eg Reference sections are excluded)
  • Document must have all pending comments in Data Coverage document made until Feb 28th marked as resolved
Things to consider:
  1. If devs or prod find along the way unknowns, or outstanding questions on how the task should be tackled, they must log them in the decision log and consult with the team the best path to move forward before implementing any changes. Product has agreed to reduce velocity until outstanding questions and unknowns are resolved.
  2. This is a parent task and will be broken down in smaller chunks of work or subtasks
  3. Redoing past work is a risk that product has accepted (The risk cannot be resolved, so it must be accepted as-is and dealt with as necessary)
  4. Log findings, so that they can be triaged either as known issues or as needed fixes (which will need phab tickets and be prioritized)

Event Timeline

JArguello-WMF renamed this task from Check language coverage of infobox and abstracts to Check language coverage of infobox and abstracts - needs refinement.Jan 18 2024, 2:43 PM
JArguello-WMF assigned this task to E.Enabulele.
JArguello-WMF updated the task description. (Show Details)
JArguello-WMF set the point value for this task to 8.
JArguello-WMF changed the point value for this task from 8 to 13.
SDelbecque-WMF renamed this task from Check language coverage of infobox and abstracts - needs refinement to {Machine Readability} Coverage metrics of Machine Readability features (abstracts, infobox, sections).Jan 31 2024, 11:28 AM
SDelbecque-WMF added a project: Epic.
SDelbecque-WMF updated the task description. (Show Details)
SDelbecque-WMF renamed this task from {Machine Readability} Coverage metrics of Machine Readability features (abstracts, infobox, sections) to {Machine Readability} Coverage metrics of Machine Readability features (abstracts, infobox, sections, lists).Mar 21 2024, 12:50 PM