Page MenuHomePhabricator

Some page previews for locations only show coordinates and no useful text on cards
Closed, ResolvedPublicBUG REPORT

Description

The URL en.wikipedia.org/v1/page/summary/Arkansas generates an empty summary.

As discussed at enwiki at:

Some pages with coordinates (according to the reports, those with the coordinate template outside the infobox) only show a coordinate and no other text on the page preview card, like this:

Screenshot_20230606_092717.png (747×926 px, 104 KB)

Steps to replicate the issue (include links if applicable):

  • Go to https://en.wikipedia.org/wiki/Arabian_Peninsula (make sure pop ups are not disabled)
  • Hover over the list of countries on the second paragraph: "[[Bahrain]], [[Kuwait]], [[Oman]], [[Qatar]], [[Saudi Arabia]], the [[United Arab Emirates]] (UAE) and [[Yemen]], as well as southern [[Iraq]] and [[Jordan]]."
  • Of the above list, only Bahrain, Yemen and Jordan -as of this writing- is showing the expected text summary, the rest is just showing unhelpful coordinates.

What happens?:

Coordinates shown instead of part of the first real paragraph.

What should have happened instead?:

Showing the first paragraph.

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

My guess is this was triggered by a change in content (either on the pages or through a template, such as the coord template- but the software should anyway be able to parse correctly and show a meaningful summary on the text.

Event Timeline

Change 927737 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/mobileapps@master] [WIP] Remove .geo-inline(-hidden) elements when processing

https://gerrit.wikimedia.org/r/927737

Quiddity subscribed.

removing User-notice as this only affects one wiki (IIUC). Thanks though!

I've added a reasonable workaround in lieu of a full fix onwiki (for now) which leaves the display as whitespace only because TextExtracts picks up <p>\n\n</p> and emits \n\n. It would be worth considering ignoring elements in the 'core' API where the only content is whitespace.

My question is, should I abandon the above patch in T338204#8906583?

@Izno Thoughts?

My question is, should I abandon the above patch in T338204#8906583?

@Izno Thoughts?

It's up to you. I'm not personally a fan of adding stuff that is wiki-specific in the core software if it can be avoided. I'm not sure this is a case where it can be, and clearly you already have some stuff in there.

(I do think the "maybe redundant" comments in the patch can go, they aren't redundant lines.)

See also my suggestion from above; if it were the case that white-space only TextExtracts returned the next best candidate paragraph instead, that seems more generally useful. At least if it's not already set up like that and something else is going wrong. That might allow the coords related stuff in that file to be removed (partially/totally?) in favor of the current onwiki fix. I don't know how much work that is. (I did just catch table.navbox -> the .navbox class isn't guaranteed to be a table, as in fact it's not on English Wikipedia; nor is .infobox a div [yet] on English Wikipedia.)

Yeah, looking at this a second time, I think the correctest fix is to trim the whitespace internal to the elements being assessed as candidates. Then the page previews here, were the spans marked up as noexcerpt (as they are now), would not display, because the candidate paragraph gets ripped out by rmElements(...p:empty...).

Totally coincidental, I had occasion to be reviewing the lead paragraph transform that MobileFrontend does and that trims whitespace from its candidate paragraphs before selection. Perhaps there is an opportunity to adjust the core service such that MF could make use of it as well (at some arbitrary date).

I'm back here because I got nudged onwiki at VPT where I have also stated that we could still potentially fix this in the module, it's just nobody has jumped on my suggestion yet on Module talk:Coordinates. :(

Change 927737 abandoned by Arlolra:

[mediawiki/services/mobileapps@master] Remove .geo-inline(-hidden) elements when processing

Reason:

https://gerrit.wikimedia.org/r/927737

Just a note this has nothing to do with the TextExtracts extension.

There's also a discussion here: https://en.m.wikipedia.org/w/index.php?title=Template_talk:Coord&oldid=1188366871#c-Izno-20231204234400-Jdlrobson-20231204231900

@Arlolra I will take a look at this as I think there's actually a bug here in the endpoint with the code for defining an empty paragraph.

Change 980437 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/services/mobileapps@master] [WIP] Extract lead introduction should ignore marked nodes

https://gerrit.wikimedia.org/r/980437

I've posted a patch to fix this with a new test. The logic was incorrectly identifying the lead paragraph as the paragraph that contained the coordinate. Who is able to review this?

Change 980437 merged by jenkins-bot:

[mediawiki/services/mobileapps@master] Extract lead introduction should consider marked nodes

https://gerrit.wikimedia.org/r/980437

Jgiannelos subscribed.

Patch is deployed. Summary is pregenerated so pages might need to be purged in order summary to be updated.

For example:

Before purging it returns the broken summary response
After purging it returns content in summary.

Thanks for the quick resolution here! Paging @Izno - let me know if you get any reports of further page previews issues as we've slightly changed the logic of how the summaries are rendered (hopefully for the better!).