Page MenuHomePhabricator

Serve Wikidata traffic via Kubernetes
Closed, ResolvedPublic

Description

We currently serve 8% of all traffic via k8s; since mw-on-k8s: Remove wikidata exception, this is supposed to apply to Wikidata as well. However, I’m not seeing any responses that indicate they come from a k8s server yet:

$ for i in {1..100}; do curl -sIH 'User-Agent: test-Iebdc15b19b (lucas.werkmeister@wikimedia.de)' https://www.wikidata.org/wiki/Special:Random; done | grep -i '^server:' | grep -vF .codfw.wmnet
(no output)

By comparison, English Wikipedia is showing an expected rate of k8s responses:

$ for i in {1..100}; do curl -sIH 'User-Agent: test-Iebdc15b19b (lucas.werkmeister@wikimedia.de)' https://en.wikipedia.org/wiki/Special:Random; done | grep -i '^server:' | grep -vF .codfw.wmnet
server: mw-web.codfw.main-7bc9ddd6f8-t5zvm
server: mw-web.codfw.canary-64cbcc6854-tgpzn
server: mw-web.codfw.main-7bc9ddd6f8-pqxd4
server: mw-web.codfw.main-7bc9ddd6f8-hxkw8
server: mw-web.codfw.main-7bc9ddd6f8-pqxd4
server: mw-web.codfw.main-7bc9ddd6f8-c7pkc
server: mw-web.codfw.canary-64cbcc6854-vvp52

In addition to the 100 requests shown above, I also made 337 more (37 manual, three more loop runs; all in the last 15 minutes or so), so in total over 400 requests were seemingly not served by k8s. Seems worth investigating.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@Jdforrester-WMF no, this task is actually about that patch not having the effect we expected.

Interestingly, I do get correct results for m.wikidata.org, but somehow not for www.wikidata.org (also, please grep for mw-web as we've repooled eqiad in the meantime).

This makes the whole thing even more puzzling tbh.

Mentioned in SAL (#wikimedia-operations) [2023-09-28T07:25:38Z] <_joe_> restarting trafficserver on cp1081 T347493

I tried restarting ATS on a backend, cp1081, then made requests for wikidata's special:random to trafficserver directly: still all going to appservers on bare metal.

So the problem isn't in mw-on-k8s.lua, apparently...

Change 961684 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/puppet@production] wikidata: add mw-on-k8s routing

https://gerrit.wikimedia.org/r/961684

Well turns out the issue was simpler: we even had a TODO in the code:

# TODO: add mw-on-k8s once we think of moving wikidata or partial traffic.

Sigh. Thanks @Lucas_Werkmeister_WMDE for noticing, this will be fixed as soon as I get a review.

Change 961684 merged by Giuseppe Lavagetto:

[operations/puppet@production] wikidata: add mw-on-k8s routing

https://gerrit.wikimedia.org/r/961684

Seems to be working now, thanks a lot for fixing it!

$ for i in {1..100}; do curl -sIH 'User-Agent: test-Iebdc15b19b (lucas.werkmeister@wikimedia.de)' https://www.wikidata.org/wiki/Special:Random; done | grep -i '^server:.*-'
server: mw-web.eqiad.main-c5bd4c67d-88cc4
server: mw-web.eqiad.main-c5bd4c67d-vcl9q
server: mw-web.eqiad.main-c5bd4c67d-654wg
server: mw-web.eqiad.main-c5bd4c67d-88cc4
server: mw-web.eqiad.main-c5bd4c67d-vcl9q
server: mw-web.eqiad.main-c5bd4c67d-s5jdm
server: mw-web.eqiad.main-c5bd4c67d-vcl9q
server: mw-web.eqiad.main-c5bd4c67d-rksxb
server: mw-web.eqiad.canary-69956bff4b-t7498
server: mw-web.eqiad.main-c5bd4c67d-rksxb
Clement_Goubert claimed this task.

Yes, confirmed now working. Resolving.