Page MenuHomePhabricator

Properly handle the networking errors and aborted requests
Closed, ResolvedPublic

Description

We should not log the aborted requests as PagePreviewsAPIFailure because that makes API Failures chart unusable. After deployment of T197700, the chart went from <10 events per minute to 200-400 events per minute and the graph doesn't present API failures. The noise generated by aborted requests is too big. The abort action should not increment the PagePreviewsApiFailure because it is not a request failure.

Developer notes

The catch logic in actions.js is incorrect. result parameter will be always undefined as the only place that causes catch() statement to be executed is this this throw. Current logic assumes that every time we land in catch((err, result) => { ... }) scope is not a network issue (because of the fact that result is always not defined) and it shows the generic page preview. Every fetch exception causes FETCH_COMPLETE state, which triggers the generic preview state.

Because of that, we cannot:

  • determine if this is an offline issue and not show the page preview
  • it's not possible to not call FETCH_COMPLETE when the request was aborted
Bonus points:

Rest gateway shouldn't know how to handle not found issues

Acceptance criteria:
  • If I go to a page with Page Previews on, change to offline mode in the networking tab, hover a link - it should not show the Page Preview
  • When a request abort happens, Page Previews should not switch into FETCH_COMPLETE state
  • When a request is aborted, the counter.PagePreviewsApiFailure should not be incremented
  • when the server (both Mediawiki and Restbase) returns 4xx or 5xx error the Page Preview should be visible (the generic preview)
NOTE: This last AC can be verified after the MediaWiki train deployment on Thursday, 26th.
  • Post-deployment: The API Failures graph should get back to normal.

QA

  • Test on staging.
  • Drop connection, no page preview should show on any link
  • Restore connection, page previews should show on all links
  • When I hover over "List of D. C. Thomson & Co. Ltd publications" on http://reading-web-staging.wmflabs.org/wiki/Special:AllPages I should see an error (no lead section)
  • If page doesn't exist, a broken page preview will show e.g. Timed test 2
  • If a page preview summary request results in a 500 error then a broken page preview will show (not sure how to test this one without changing code, but JR checked this manually)

Event Timeline

Change 445543 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/extensions/Popups@master] Properly handle catch() when calling gateway fetch.

https://gerrit.wikimedia.org/r/445543

Change 445545 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/extensions/Popups@master] When request gets aborted it shouldn't finish with FETCH_FAILED

https://gerrit.wikimedia.org/r/445545

Change 445543 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Properly handle catch() when calling gateway fetch.

https://gerrit.wikimedia.org/r/445543

Jdlrobson updated the task description. (Show Details)
Jdlrobson added subscribers: ABorbaWMF, Ryasmeen, Jdlrobson.

Skipping design review. @Ryasmeen @ABorbaWMF could we do a generic page previews QA for error handling? We'll want to check that page previews do not show when the user's connection drops and continue to show when the connection is live. I've added some QA notes, let me know if I can help with anything more.

Change 445545 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] When request gets aborted it shouldn't finish with FETCH_FAILED

https://gerrit.wikimedia.org/r/445545

Checked all the acceptance criteria except the last one, and they are passing.

Checked all the acceptance criteria except the last one, and they are passing.

The last AC can be checked after next week's train rolls. This week's MediaWiki train deployed was blocked by T199941: Fatal MWException in Babel: "Language::isValidBuiltInCode must be passed a string" so we won't see the significant dip in the API Failures graph that we're expecting.

When a request is aborted, the counter.PagePreviewsApiFailure should not be incremented

Screen Shot 2018-07-20 at 15.30.43.png (63×820 px, 24 KB)

You can see from the above that a successful requests resulted in us logging that the response time was 213 ms this time (PagePreviewsApiResponse=213ms).

Glancing over the same link twice resulted in two aborted requests with no corresponding logging of the response time.

When a request abort happens, Page Previews should not switch into FETCH_COMPLETE state

Screen Shot 2018-07-20 at 15.50.59.png (156×359 px, 14 KB)

☝️ Note the FETCH_ABORTED state rather than the _COMPLETE state.

It was quite hard to verify that this happened in production as the Page Summary endpoint was responding in ~50 ms!


@Ryasmeen: Since hewiki has had this code deployed (only the group 2 wikis weren't deployed to on Thursday), I'm browsing hewiki with ResourceLoader's debug mode enabled, e.g. https://he.wikipedia.org/wiki/%D7%A2%D7%9E%D7%95%D7%93_%D7%A8%D7%90%D7%A9%D7%99?debug=true. When that mode is enabled, Page Previews will use the Redux DevTools extension if it's available and you can inspect its state.

phuedx updated the task description. (Show Details)

T191059: 1.32.0-wmf.13 deployment blockers has been resolved so I should be able to check the last AC.

Great work, y'all 🎉🎉🎉