Page MenuHomePhabricator

Gracefully handle broken ExternalData services
Open, MediumPublic

Description

If for some reason one of the ExternalData URLs return an error, kartographer should simply skip it (could also send an eventlog msg) , and continue showing all other overlays.

Related incident: https://wikitech.wikimedia.org/wiki/Incident_documentation/2017-06-01_Maps

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@MaxSem no, this is more of a full review that we should do to ensure graceful service degradation. So that if a map uses multiple external data items, and one of them fail, what should the map show - everything else? nothing but the base map? Only locally defined data? What if maplink's data loading failed - should we abort map popup? Or show the base map at that location?

What I suggest is:

  • We show as many valid data layers as we can.
  • In case there was an error, and some map data is missing, we display an error message such as: An issue happened while loading map data.. In practice, it could look like:
iphone 6 view
Screen Shot 2016-10-26 at 4.15.16 PM.png (1×642 px, 379 KB)

I did it like this to present an early prototype, but we may want to display a toggle button in the top-right corner.

  • In addition, if there is an error, I suggest we send an error event via EventLogging so we can track these internally. Advice?

I like the error box - it should be shown until user interacts with it. This will be the mode of operation for non-static mapframes and all maplinks. For static mapframes, we need to implement the same functionality on the server, which might be a bit tricky.
And yes, it should probably be sent to logstash - i am not sure how to report it - apparently mw.log() is a noop in non-debug mode.

Change 320538 had a related patch set uploaded (by JGirault):
Gracefully handle broken ExternalData groups, by not crashing and displaying a user-friendly warning

https://gerrit.wikimedia.org/r/320538

Latest update display a warning alert such as:

DesktopMobile
Screen Shot 2016-11-08 at 3.18.57 PM.png (527×624 px, 202 KB)
Screen Shot 2016-11-08 at 3.25.05 PM.png (669×378 px, 124 KB)

An issue happened while loading map data.
^ @Yurik, @debt, @CKoerner_WMF, @MaxSem : what do you think of the message itself? Current one is quite vague... any better suggestion?

The warning alert is not looking exactly as neat as the mockup provided by @RHo, and supported by the Design group so far (until we decide true guidelines).
The big difference is in the icon: it's displaying the OOjs UI black alert icon instead of a yellow circle one. FWIW, the shape and the color of the icon are under discussion.

I'm a little concerned that there isn't an action for the user to do. If they just refresh - will the map appear normally? Or is this message meant for a more (fatal) error?

@debt, it could be both - the message could mean that the SPARQL query has been incorrectly written, and the service reports an error, or it could mean that there is a network connection problem and the result of that query could not be downloaded. And it might not be trivial to distinguish the two. We could eventually handle data download problems differently and report a different error... @JGirault, is that even possible to tell what problem occurred?

To echo @debt, is there a way to link the warning to documentation (on-wiki) on what the message means, common causes, and possible resolutions?

"You may have come across this page after seeing the warning message. This error message occurs when some data present in a map can not be loaded. Below you can find common causes for the error and possible fixes."

Similar to when you're watching a video on YouTube and the connection is poor. A little message appears and links to this page:

https://support.google.com/youtube/answer/74662?hl=en

We may want to set up Help:Extension:Kartographer/FAQ for the common problems? Problem with on-wiki documentation is that it may change at any moment, pages could be moved, and we will have to keep updating the links. We might want to link to the whole page instead?

I'm a little concerned that there isn't an action for the user to do. If they just refresh - will the map appear normally? Or is this message meant for a more (fatal) error?

As @Yurik said, there are 2 possibilities:

  • A query issue, in which case any page refresh will have no impact. As long as the query fails, the service will return an error, and the warning message will pop up. The only action there is to go fix the query.
  • A network issue, in which case a page refresh may or may not solve the problem. I guess the highest probability is that the service is temporarily down, and will still be down the next time.

We could eventually handle data download problems differently and report a different error... @JGirault, is that even possible to tell what problem occurred?

It is.

The geoshape service throws a 400 Bad Request in case of bad query, and the mapdata library is aware of that error code (or any error code that would be returned by a service). It's only a matter of sharing that information (the error code) over to the extension code (Kartographer), and then Kartographer will be able to format a specific message per error code.

FWIW, see wikipedia article on HTTP Error codes.

Deskana subscribed.

This is not active, and has been stalled for well over a month now, so moving back in to the backlog.

Moving off the sprint board - the Discovery team won't be able to finish this work at this time.

Change 320538 abandoned by Sbisson:
Gracefully handle broken ExternalData groups, by not crashing and displaying a user-friendly warning

Reason:
This change doesn't seem to be under active development. Please restore if I'm mistaken.

https://gerrit.wikimedia.org/r/320538

Krinkle updated the task description. (Show Details)
Krinkle subscribed.

Unresolved incident followup. Tagging for triage per mw:Maintainers. I realize it was tagged before, but was untagged without explanation. If a different team is expected to handle this instead, please indicate whom, or escalate via management as needed.