Properly handle the networking errors and aborted requests
Closed, ResolvedPublic

Description

We should not log the aborted requests as PagePreviewsAPIFailure because that makes API Failures chart unusable. After deployment of T197700, the chart went from <10 events per minute to 200-400 events per minute and the graph doesn't present API failures. The noise generated by aborted requests is too big. The abort action should not increment the PagePreviewsApiFailure because it is not a request failure.

Developer notes

The catch logic in actions.js is incorrect. result parameter will be always undefined as the only place that causes catch() statement to be executed is this this throw. Current logic assumes that every time we land in catch((err, result) => { ... }) scope is not a network issue (because of the fact that result is always not defined) and it shows the generic page preview. Every fetch exception causes FETCH_COMPLETE state, which triggers the generic preview state.

Because of that, we cannot:

  • determine if this is an offline issue and not show the page preview
  • it's not possible to not call FETCH_COMPLETE when the request was aborted
Bonus points:

Rest gateway shouldn't know how to handle not found issues

Acceptance criteria:
  • If I go to a page with Page Previews on, change to offline mode in the networking tab, hover a link - it should not show the Page Preview
  • When a request abort happens, Page Previews should not switch into FETCH_COMPLETE state
  • When a request is aborted, the counter.PagePreviewsApiFailure should not be incremented
  • when the server (both Mediawiki and Restbase) returns 4xx or 5xx error the Page Preview should be visible (the generic preview)
NOTE: This last AC can be verified after the MediaWiki train deployment on Thursday, 26th.
  • Post-deployment: The API Failures graph should get back to normal.

QA

  • Test on staging.
  • Drop connection, no page preview should show on any link
  • Restore connection, page previews should show on all links
  • When I hover over "List of D. C. Thomson & Co. Ltd publications" on http://reading-web-staging.wmflabs.org/wiki/Special:AllPages I should see an error (no lead section)
  • If page doesn't exist, a broken page preview will show e.g. Timed test 2
  • If a page preview summary request results in a 500 error then a broken page preview will show (not sure how to test this one without changing code, but JR checked this manually)
pmiazga created this task.Jul 12 2018, 9:59 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 12 2018, 9:59 PM

Change 445543 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/extensions/Popups@master] Properly handle catch() when calling gateway fetch.

https://gerrit.wikimedia.org/r/445543

pmiazga updated the task description. (Show Details)Jul 12 2018, 10:20 PM

Change 445545 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/extensions/Popups@master] When request gets aborted it shouldn't finish with FETCH_FAILED

https://gerrit.wikimedia.org/r/445545

Change 445543 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Properly handle catch() when calling gateway fetch.

https://gerrit.wikimedia.org/r/445543

pmiazga claimed this task.Jul 13 2018, 12:54 PM
pmiazga updated the task description. (Show Details)Jul 13 2018, 4:44 PM
Jdlrobson updated the task description. (Show Details)Jul 13 2018, 5:33 PM
Jdlrobson reassigned this task from pmiazga to ABorbaWMF.
Jdlrobson added subscribers: ABorbaWMF, Ryasmeen, Jdlrobson.

Skipping design review. @Ryasmeen @ABorbaWMF could we do a generic page previews QA for error handling? We'll want to check that page previews do not show when the user's connection drops and continue to show when the connection is live. I've added some QA notes, let me know if I can help with anything more.

Change 445545 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] When request gets aborted it shouldn't finish with FETCH_FAILED

https://gerrit.wikimedia.org/r/445545

Niedzielski reassigned this task from ABorbaWMF to Ryasmeen.Jul 16 2018, 6:23 PM

Checked all the acceptance criteria except the last one, and they are passing.

phuedx updated the task description. (Show Details)Jul 20 2018, 2:03 PM
phuedx added a subscriber: phuedx.

☝️ per T199482#4439716.

Checked all the acceptance criteria except the last one, and they are passing.

The last AC can be checked after next week's train rolls. This week's MediaWiki train deployed was blocked by T199941: Fatal MWException in Babel: "Language::isValidBuiltInCode must be passed a string" so we won't see the significant dip in the API Failures graph that we're expecting.

phuedx updated the task description. (Show Details)Jul 20 2018, 2:31 PM

When a request is aborted, the counter.PagePreviewsApiFailure should not be incremented

You can see from the above that a successful requests resulted in us logging that the response time was 213 ms this time (PagePreviewsApiResponse=213ms).

Glancing over the same link twice resulted in two aborted requests with no corresponding logging of the response time.

phuedx updated the task description. (Show Details)Jul 20 2018, 2:51 PM

When a request abort happens, Page Previews should not switch into FETCH_COMPLETE state

☝️ Note the FETCH_ABORTED state rather than the _COMPLETE state.

It was quite hard to verify that this happened in production as the Page Summary endpoint was responding in ~50 ms!


@Ryasmeen: Since hewiki has had this code deployed (only the group 2 wikis weren't deployed to on Thursday), I'm browsing hewiki with ResourceLoader's debug mode enabled, e.g. https://he.wikipedia.org/wiki/%D7%A2%D7%9E%D7%95%D7%93_%D7%A8%D7%90%D7%A9%D7%99?debug=true. When that mode is enabled, Page Previews will use the Redux DevTools extension if it's available and you can inspect its state.

phuedx updated the task description. (Show Details)Jul 20 2018, 3:16 PM
phuedx claimed this task.
Jdlrobson moved this task from Backlog to Done on the Page-Previews board.Jul 25 2018, 8:36 AM

T191059: 1.32.0-wmf.13 deployment blockers has been resolved so I should be able to check the last AC.

phuedx updated the task description. (Show Details)Jul 25 2018, 10:15 AM
phuedx closed this task as Resolved.Jul 25 2018, 10:22 AM

Great work, y'all 🎉🎉🎉

Restricted Application added a project: User-Ryasmeen. · View Herald TranscriptJul 25 2018, 10:22 AM