Page MenuHomePhabricator

{Investigation} RCA on Abstracts
Closed, ResolvedPublic3 Estimated Story Points

Description

As pm, I'd like to do a root cause analysis on Abstracts and Infobox feedback.

Abstracts

  • Toyota abstracts starts with is a Japanese multinational automotive. Leading Toyota is missing
  • Grimes abstract have additional spaces before ,: Claire Elise Boucher , known professionally as ...
  • Gwen Stefani: semicolon after parenthesis Gwen Renée Stefani (; born October 3, 1969) is an ...
  • Thesaurus: quote not properly flattened ... express an idea: Synonym dictionaries have a long history...
  • Body mass index: squared numbers not properly rendered is expressed in units of kg/m2
  • Aldi: missing pronunciation ... introduced the name Aldi, which is pronounced. In ...
  • Free trade agreements of the European Union: list flattened of the process: Before negotiations start, me. simpler to properly parse list from https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=1&titles=Free_trade_agreements_of_the_European_Union&redirects=true

Acceptance Criteria

  • Documented Root Cause Analysis
  • ticket(s) with fixes needed

Event Timeline

One difference between the GetAbstract and GetSections is that braces are stripped from the abstracts. Example: Freda Josephine Baker (née McDonald; June 3, 1906 – April 12, 1975),... in the text is output as: Freda Josephine Baker,.... Should we remove braces from the sections text?

JArguello-WMF changed the point value for this task from 5 to 3.Dec 5 2023, 2:10 PM

Report in "RCA - Abstract Defects"

TLDR: we refactor the getAbstract to use getSections method. Also, we update the getSections logic to fix most of the defects/requests. One or two requests are not going to be "fixed"