Page MenuHomePhabricator

restbase: "featured" endpoint times out
Closed, ResolvedPublic

Description

The "featured" (article of the day) restbase endpoint started timing out:

/en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out

This happened shortly after the deployment of https://gerrit.wikimedia.org/r/612396 which set wgForceHTTPS to true by default.

The local Icinga NRPE command is:

/usr/local/bin/check-restbase

For example on restbase2017:

 /usr/local/bin/check-restbase
WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'ReadTimeoutError("HTTPConnectionPool(host='10.192.48.119', port=7231): Read timed out. (read timeout=5)",)': /en.wikipedia.org/v1/feed/featured/2016/04/29
/en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received

nodejs runs on port 7231 and can be manually connected to.

Screenshot at 2020-07-13 20-40-12.png (547×1 px, 177 KB)

On https://en.wikipedia.org/api/rest_v1/#/Feed/aggregatedFeed when selecting the "aggregated daily featured content", clicking "try it out", filling in some values and submitting.. it times out.

But other things work, like:

< Krinkle> wikifeeds API
< Krinkle> and https://en.wikipedia.org/api/rest_v1/feed/onthisday/events/04/11 also works fine
< Krinkle> and regular restbase/page html also fine: https://en.wikipedia.org/api/rest_v1/page/html/Foobar

Event Timeline

Note this endpoint is also marked as "unstable" in the API page

on restbase2017:

times out: /usr/bin/service-checker-swagger -t 5 10.192.48.119 http://10.192.48.119:7231/en.wikipedia.org/v1

works: curl http://10.192.48.119:7231/en.wikipedia.org/v1/

if you wait for about 4 minutes it eventually will return content when testing with curl. using the service-checker-swagger though fails relatively soon also if you raise the -t parameter value a lot.

Joe triaged this task as Unbreak Now! priority.Jul 14 2020, 6:29 AM
Joe added a subscriber: Joe.

The homepage of the mobile applications is broken since tonight.

The feeds that do work do so because they're cached at the edge and/or restbase. Not because wikifeeds (and termbox, for that matter) are not broken.

The problem has been identified:

With the introduction of I80ca62643f5c, the mediawiki api returns a 302 to https for every request that doesn't include the X-Forwarded-Proto: https header. This makes termbox and wikifeeds try to reach the sites via the edge addresses, and that times out because we whitelist egress from kubernetes.

Change 612486 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/mediawiki-config@master] Revert "Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 4)"

https://gerrit.wikimedia.org/r/612486

Change 612486 merged by jenkins-bot:
[operations/mediawiki-config@master] Revert "Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 4)"

https://gerrit.wikimedia.org/r/612486

Mentioned in SAL (#wikimedia-operations) [2020-07-14T07:32:28Z] <oblivian@deploy1001> sync-file aborted: revert forcehttps in an attempt to fix T257887 (duration: 00m 20s)

Mentioned in SAL (#wikimedia-operations) [2020-07-14T07:48:19Z] <oblivian@deploy1001> Synchronized wmf-config/InitialiseSettings.php: revert forcehttps in an attempt to fix T257887 (duration: 01m 06s)

Joe lowered the priority of this task from Unbreak Now! to High.

Resetting to high since we've fixed the immediate problem by reverting the MediaWiki patch.

Before we roll it out again we need:

  • Fix wikifeeds to call the api via HTTPS
  • Fix the scap configuration so that it calls https://en.wikipedia.org
  • Fix pybal's configuration for the http pool

The fix for termbox issues seems not to be strictly connected with that patch.

Change 612513 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/deployment-charts@master] wikifeeds: use the puppet CA if available, call the mw api via https

https://gerrit.wikimedia.org/r/612513

Change 612513 merged by Giuseppe Lavagetto:
[operations/deployment-charts@master] wikifeeds: use the puppet CA if available, call the mw api via https

https://gerrit.wikimedia.org/r/612513

Change 612521 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/deployment-charts@master] termbox: use https to reach the api, the puppet CA where needed.

https://gerrit.wikimedia.org/r/612521

Change 612521 merged by jenkins-bot:
[operations/deployment-charts@master] termbox: use https to reach the api, the puppet CA where needed.

https://gerrit.wikimedia.org/r/612521

Change 612535 had a related patch set uploaded (by JMeybohm; owner: JMeybohm):
[operations/deployment-charts@master] Remove cluster specific uri's (as they are not cluster specific)

https://gerrit.wikimedia.org/r/612535

Change 612535 merged by jenkins-bot:
[operations/deployment-charts@master] Remove cluster specific uri's (as they are not cluster specific)

https://gerrit.wikimedia.org/r/612535

Change 612538 had a related patch set uploaded (by JMeybohm; owner: JMeybohm):
[operations/deployment-charts@master] Include private/general.yaml for staging as well

https://gerrit.wikimedia.org/r/612538

Change 612538 merged by jenkins-bot:
[operations/deployment-charts@master] Include private/general.yaml for staging as well

https://gerrit.wikimedia.org/r/612538

Change 612540 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] scap: check enwiki via https, not http

https://gerrit.wikimedia.org/r/612540

Change 612541 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] pybal: check wikidata, not enwiki and expect 302

https://gerrit.wikimedia.org/r/612541

Change 612540 merged by Giuseppe Lavagetto:
[operations/puppet@production] scap: check enwiki via https, not http

https://gerrit.wikimedia.org/r/612540

Change 612541 merged by Giuseppe Lavagetto:
[operations/puppet@production] pybal: check wikidata, not enwiki and expect 302

https://gerrit.wikimedia.org/r/612541

Resetting to high since we've fixed the immediate problem by reverting the MediaWiki patch.

Before we roll it out again we need:

  • Fix wikifeeds to call the api via HTTPS
  • Fix the scap configuration so that it calls https://en.wikipedia.org
  • Fix pybal's configuration for the http pool

The fix for termbox issues seems not to be strictly connected with that patch.

Termbox was basically the same issue but not cached by the revert because wiki groups where switched in multiple phases (and we just reverted the last one).

Termbox and wikifeeds are now configured to call api via HTTPS.

Change 612492 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/mediawiki-config@master] Revert "Revert "Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 4)""

https://gerrit.wikimedia.org/r/612492

Change 612492 merged by jenkins-bot:
[operations/mediawiki-config@master] Revert "Revert "Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 4)""

https://gerrit.wikimedia.org/r/612492

Mentioned in SAL (#wikimedia-operations) [2020-07-14T12:57:40Z] <oblivian@deploy1001> Synchronized wmf-config/InitialiseSettings.php: revert forcehttps after fixing T257887 (duration: 01m 02s)

Change 612588 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] changeprop: Talk to the API over HTTPS

https://gerrit.wikimedia.org/r/612588

Change 612594 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] eventgate: Switch stream_config_url to https

https://gerrit.wikimedia.org/r/612594

Change 612588 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop: Talk to the API over HTTPS

https://gerrit.wikimedia.org/r/612588

Change 612600 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] mobileapps: Talk to API over HTTPS

https://gerrit.wikimedia.org/r/612600

Change 612594 merged by jenkins-bot:
[operations/deployment-charts@master] eventgate: Switch stream_config_url to https

https://gerrit.wikimedia.org/r/612594

Change 612600 merged by jenkins-bot:
[operations/deployment-charts@master] mobileapps: Talk to API over HTTPS

https://gerrit.wikimedia.org/r/612600