Page MenuHomePhabricator

Restbase routing down on beta, 2020-02-07
Closed, ResolvedPublic

Description

It looks like Restbase might be down on beta [1]

Expected behavior

  1. Go to https://en.wikipedia.beta.wmflabs.org/wiki/Talk:Cats
  2. Click a "Reply" link
  3. Draft a reply
  4. Click "Reply"

✅ 5. Comment is successfully published to https://en.wikipedia.beta.wmflabs.org/wiki/Talk:Cats

Actual behavior

  1. Go to https://en.wikipedia.beta.wmflabs.org/wiki/Talk:Cats
  2. Click a "Reply" link
  3. Draft a reply
  4. Click "Reply"

❌ 5. The following warning appears above the reply text input area Invalid response from server.


  1. https://en.wikipedia.beta.wmflabs.org/api/rest_v1/page/html/Talk%3AChat/412808?redirect=false&stash=true

Event Timeline

Looks like a routing problem. The deployment-restbase0* instances are working fine, as is restbase-beta.wmflabs.org (try it), but the BC no longer appears to know how to handle /api/rest_v1/ paths.

Ryasmeen triaged this task as Unbreak Now! priority.Feb 7 2020, 8:59 PM

I am raising it as UBN, since all the testing activities are stalled because of this.

Jdforrester-WMF renamed this task from Restbase [might be] down on beta to Restbase down on beta, 2020-02-07.Feb 7 2020, 9:24 PM
Jdforrester-WMF renamed this task from Restbase down on beta, 2020-02-07 to Restbase routing down on beta, 2020-02-07.

To help isolate possible culprits, when was this last working?

Pchelolo subscribed.

RESTBase itself seem to be working correctly. Something is wrong with routing before RESTBase. If you look at https://en.wikipedia.beta.wmflabs.org/api/rest_v1/ the default error page my mediawiki is returned.

To help isolate possible culprits, when was this last working?

Yesterday, but not sure when.

Nothing obvious recently merged in puppet.

Did fix puppet on cache-text05 yesterday, it did a lot of stuff to replace
some nginx/varnish stuff with ATS. May be related?

Did fix puppet on cache-text05 yesterday, it did a lot of stuff to replace
some nginx/varnish stuff with ATS. May be related?

That sounds very plausible.

It looks like routing for /api/rest_v1/ in Beta is set up in the prefix puppet settings for deployment-cache-text (as seen here). Not sure what could be breaking it.

Not sure T243226 might be related. Posting just in case.

deployment-restbase01.deployment-prep.eqiad.wmflabs reports The last Puppet run was at Mon Jan 20 10:54:08 UTC 2020 (26669 minutes ago). but as it's still working I guess it's fine?

deployment-restbase01.deployment-prep.eqiad.wmflabs reports The last Puppet run was at Mon Jan 20 10:54:08 UTC 2020 (26669 minutes ago). but as it's still working I guess it's fine?

Puppet run fails on RESTBase node with

Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Resource type not found: Wmflib::Service at /etc/puppet/modules/wmflib/functions/service/fetch.pp:4:51 on node deployment-restbase01.deployment-prep.eqiad.wmflabs

Ah, right, that's the please-upgrade-puppet task that @MarcoAurelio linked above.

looks like profile::trafficserver::backend::mapping_rules in hieradata/labs.yaml only has support for mediawiki and upload - it's missing the restbase section that appears in hieradata/common/profile/trafficserver/backend.yaml

(I added some corrected hieradata to cache-text05 in horizon)

Thank you!

+1 – thank you for your quick help on this, @Krenair.