Page MenuHomePhabricator

[BUG] Formatting in local descriptions is breaking use of descriptions (example: San Francisco article)
Closed, ResolvedPublic

Description

How many times were you able to reproduce it?

5/5

Steps to reproduce

  1. Search for "San Francisco"
  2. Open the article in the app

Expected results

Article is showing

Actual results

Only lead image is showing

Screenshots

IMG_00E23EF15B11-1.jpeg (2×1 px, 733 KB)

Environments observed

App version: 1403, latest
OS versions: 11.2 (all?)
Device model: iPhone X (all?)
Device language: EN


CAUSE of the iOS app not showing the article at all:

Some of our page loading JS didn't account for the API delivering newline chars in the article descriptions - was escaping quotes and backslashes but didn't anticipate that the descriptions would be anything but plain text w/o special chars such as newlines.


Desktop portal showing this weird newline as well:

Screen Shot 2018-05-25 at 8.44.08 AM.png (1×1 px, 273 KB)

We have a fix to make the iOS app more resilient to handle this ( https://github.com/wikimedia/wikipedia-ios/pull/2332 ), but without fixing this upstream the desktop portal (and other places?) would still exhibit unwanted newlines.

Event Timeline

bearND added subscribers: Tgr, bearND.

@Tgr any idea where the newlines in the description are coming from?

https://en.wikipedia.org/w/api.php?action=query&format=json&prop=description&titles=San_Francisco

"description": "City and County in California in California\n----, United States",
"descriptionsource": "local"
Mhurd updated the task description. (Show Details)
Mhurd updated the task description. (Show Details)
Mhurd updated the task description. (Show Details)

Should we try to sanitize descriptions better? Right now HTML tags are stripped but otherwise pretty much anything that is allowed in wikitext is allowed (which is basically everything other than control characters).

@Tgr Where is the string coming from? Is there a good way to debug the local source? I was going to ping you on IRC but can't find you there.

@Tgr @bearND

Heya! @Jdforrester-WMF pointed out that this is coming from Template:Infobox_settlement ( https://en.wikipedia.org/w/index.php?title=Template:Infobox_settlement&action=edit ) and that it has other problems beyond the one we're seeing.

The parser function shouldn't let users save the page with invalid input. Expecting all downstream clients to clean up the output forever isn't really scalable. :-)

The template tries to auto-construct a value but fails quite badly (with whitespace, but also with other failures like "in California in California" and no doubt other bugs on other pages. I'd strongly recommend the product manager responsible talk to the community members about how they're breaking things for readers. :-(

JMinor added subscribers: JoeWalsh, JMinor.

It is not clear who owns this. " the product manager responsible talk to the community members about how they're breaking things for readers" doesn't seem like a long term solution, as someone could come along and do this again.

I agree that it should be fixed either in the API layer or in preventing this at edit save time. This affects not just iOS but other clients/consumers as well, who expect these descriptions to conform to the format guarantees of Wikidata fields.

JMinor renamed this task from [BUG] San Francisco article showing with only the lead image to [BUG] Formatting in local descriptions is breaking use of descriptions (example: San Francisco article).May 25 2018, 5:50 PM
JMinor triaged this task as High priority.
JMinor added a subscriber: DannyH.

Is there a good way to debug the local source?

Not really. action=raw&templates=expand doesn't really help here as {{SHORTDESC}} is a side-effect-only parser function that gets "expanded" to nothing. I guess if you want to go off the deep end you can open a MediaWiki shell on a debug host, redefine the parser function to output some unique token, parse the article, and look for the token in the output...

Although if you know that the shortdescription class is used by the template enwiki uses to wrap {{SHORTDESC}} then looking at the expanded wikitext does help.

I was going to ping you on IRC but can't find you there.

I should be there, both as tgr and tgr_ (the latter is the Matrix client; not on many channels but can be DM'd).

The parser function shouldn't let users save the page with invalid input. Expecting all downstream clients to clean up the output forever isn't really scalable. :-)

Define invalid. Having newlines in a description does not seem unreasonable (OTOH Wikidata probably does not allow it and we were going for parity with that...)

I agree that it should be fixed either in the API layer or in preventing this at edit save time.

Preventing an edit because a lua module in a template decided to output a newline does not seem very nice. Probably no harm in just removing newlines (or replacing them with a space and then collapsing whitespace).

There was some back-and-forth between the WMF and the community on the appropriateness of mass-adding short descriptions via a template but I'm not sure what the end result was. Danny probably knows.

This affects not just iOS but other clients/consumers as well, who expect these descriptions to conform to the format guarantees of Wikidata fields.

I'm not sure what those are though. Does Wikidata actually prevent you from using newlines?

Define invalid. Having newlines in a description does not seem unreasonable

It seems presently nothing is preventing even multiple newlines, which seems nutty/invalid. Imagine the example below with a newline after each word - City \n and \n County \n in \n California etc...

Screen Shot 2018-05-25 at 8.44.08 AM.png (1×1 px, 273 KB)

I can do the communication here. The fact that the template is actually breaking specific pages will get people's attention. Do we know exactly what the template is doing wrong?

I posted a message on the Template:Short description talk page:

https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Short_descriptions#Infobox_settlement

I'm asking how the descriptions are constructed, and how they could be debugged, or fixed by an editor.

Vvjjkkii renamed this task from [BUG] Formatting in local descriptions is breaking use of descriptions (example: San Francisco article) to cbcaaaaaaa.Jul 1 2018, 1:07 AM
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
bearND renamed this task from cbcaaaaaaa to [BUG] Formatting in local descriptions is breaking use of descriptions (example: San Francisco article).Jul 1 2018, 4:14 AM
bearND updated the task description. (Show Details)
Jhernandez changed the task status from Open to Stalled.Jan 16 2019, 4:46 PM
Jhernandez lowered the priority of this task from High to Low.
Jhernandez subscribed.

Lowering priority given the lack of activity

Looks like there has been no community concern/feeback with this, and no further reports of issues, so seems like @Tgr 's subtask of stripping newlines has resolved this?

I think there were some concerns in grooming to keep this around open, but I don't remember what those where. Thoughts @Mholloway @bearND @Tgr?

If nothing comes to mind, I'm happy to resolve.

The goal of the subtask was to ensure people can't put things in the description override that they couldn't put into the Wikidata description (newlines, specifically). The question is, is that good enough? There was some discussion of filtering out text that looks to the user like HTML tags (ie. <p>) for example. But now I think I have confused bugs and that was discussed somewhere else.

I'm fine with closing, personally I perfer matching Wikidata restrictions and letting the community deal with other kinds of strangeness (which usually comes from a broken template or Lua module so it's good to get it fixed anyway).

Jhernandez claimed this task.

Being bold then. I think the other task is comprehensive and still in discussion with your last comments, so let's continue there 👍