Page MenuHomePhabricator

French Infobox not working in Structured-Contents endpoint {1/2 day}
Closed, ResolvedPublic1 Estimated Story PointsBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Call Structured-Contents endpoint with is_part_of.identifier set to "frwiki" for articles known to have an infobox (see Infobox Manual QA documentation).

What happens?:
Empty json payload is returned

What should have happened instead?:
Infobox should be returned.

Additional Information:
For list of frwiki articles with no infoboxes consult the QA documentation mentioned above.

Event Timeline

JArguello-WMF renamed this task from French Infobox not working in Structured-Contents endpoint to French Infobox not working in Structured-Contents endpoint {1/2 day}.Sep 27 2023, 2:35 PM
JArguello-WMF set the point value for this task to 1.

I found the issue with French infoboxes. The French pages use infobox_v3 as the class name and not infobox like other languages. See the infobox here: https://fr.wikipedia.org/wiki/Jos%C3%A9phine_Baker

This is a small code change to add an extra check for infobox_3 in our parser code. The French infoboxes do not conform to the typical infobox styles, instead us HTML tables with TH and TD cells (and table captions for section headers. Our existing parser should do a reasonable job with the TH and TD cells. We may need a future ticket for the table captions

I need 2 hours to give dev a smoke test, will do it Monday morning. Then I will let Saphanie she can run her QA test on French infoboxes

Tests show the French infobox is returned when I call /v2/structured-contents/Josephine_Baker in dev

With BODY:

{ 
    "filters": [{
            "field":"is_part_of.identifier", 
            "value":"frwiki"
        }
    ]
}

This MR is ready for QA checks

I can do the QA on this today @JArguello-WMF using the previous set of French QA articles.

QA was done in dev with different sets of articles since not all articles are present in dev. All frwiki articles tested returned an infobox. Articels tested were: 'Masechele_Caroline_Ntseliseng_Khaketla
'Masenate Mohato Seeiso
'Ndrangheta
‘O_sole_mio
"Heroes"
King_Bennie_Nawahi
Poșta_Moldovei
(1010)_Marlene
(10103)_Jungfrun
(10104)_Hoburgsgubben