There is no reason we don't have logging enabled for errors. We just have to add that to the config in puppet, and that would help to understand what layer issues are happening at. This should include a logrotate setup.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | BUG REPORT | Bstorm | T266506 Getting "502 Bad Gateway" on Toolforge tools in clusters, including tools ordia and scholia | ||
Resolved | Bstorm | T266593 Enable error logging on the haproxy layer for kubernetes |
Event Timeline
@aborrero I noticed that we are using tcp mode at the haproxy layer. That's because TLS termination is happening at the front proxy? At this point, I'm trying to remember.
Change 636765 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] k8s-haproxy: take steps to fix logging
Yes, in Toolforge TLS termination happens in the front proxy VM.
More info here: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Networking_and_ingress
Please note that in PAWS is a bit different: TLS termination happens in haproxy itself (there is no front proxy there).
Change 636765 merged by Bstorm:
[operations/puppet@production] k8s-haproxy: take steps to fix logging
Change 636975 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] k8s-haproxy: Fix a typo in the logrotate config
Change 636975 merged by Bstorm:
[operations/puppet@production] k8s-haproxy: Fix a typo in the logrotate config
We've got logs! You have to check the active haproxy instance to get them, and they are TCP logs (so not nearly as useful as HTTP logs) because outside of PAWS, TLS termination is not at the haproxy layer.
Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:58860 [28/Oct/2020:16:43:33.605] k8s-ingress k8s-ingress/tools-k8s-ingress-1.tools.eqiad1.wikimedia.cloud 1/10/20022 166 -- 546/405/404/197/0 0/0 Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:35228 [28/Oct/2020:16:43:53.442] k8s-ingress k8s-ingress/tools-k8s-ingress-2.tools.eqiad1.wikimedia.cloud 1/1/198 14127 -- 545/404/403/206/0 0/0 Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:35232 [28/Oct/2020:16:43:53.449] k8s-ingress k8s-ingress/tools-k8s-ingress-1.tools.eqiad1.wikimedia.cloud 1/0/200 14127 -- 544/403/402/196/0 0/0 Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:35260 [28/Oct/2020:16:43:53.537] k8s-ingress k8s-ingress/tools-k8s-ingress-2.tools.eqiad1.wikimedia.cloud 1/0/150 1314 -- 544/403/402/205/0 0/0 Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:35272 [28/Oct/2020:16:43:53.666] k8s-ingress k8s-ingress/tools-k8s-ingress-1.tools.eqiad1.wikimedia.cloud 1/1/47 420 -- 544/403/402/196/0 0/0 Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:35280 [28/Oct/2020:16:43:53.742] k8s-ingress k8s-ingress/tools-k8s-ingress-1.tools.eqiad1.wikimedia.cloud 1/1/4 477 -- 546/405/404/197/0 0/0 Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:34154 [28/Oct/2020:16:43:47.027] k8s-ingress k8s-ingress/tools-k8s-ingress-1.tools.eqiad1.wikimedia.cloud 1/1/6723 2430 -- 545/404/403/196/0 0/0 Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:35264 [28/Oct/2020:16:43:53.555] k8s-ingress k8s-ingress/tools-k8s-ingress-1.tools.eqiad1.wikimedia.cloud 1/0/236 435 -- 545/404/403/195/0 0/0 Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:35292 [28/Oct/2020:16:43:53.789] k8s-ingress k8s-ingress/tools-k8s-ingress-2.tools.eqiad1.wikimedia.cloud 1/1/4 477 -- 544/403/402/207/0 0/0 Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:35276 [28/Oct/2020:16:43:53.698] k8s-ingress k8s-ingress/tools-k8s-ingress-2.tools.eqiad1.wikimedia.cloud 1/1/113 938 -- 543/402/401/206/0 0/0 Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:35236 [28/Oct/2020:16:43:53.459] k8s-ingress k8s-ingress/tools-k8s-ingress-2.tools.eqiad1.wikimedia.cloud 1/2/359 631 -- 542/401/400/205/0 0/0
At very least, we can see the traffic pass through.
@aborrero Looking at the PAWS ingress stuff, I think we can/should put that setup into http mode since it is the TLS terminus. Then we can log the HTTP requests there as well. I could make a totally separate patch for that so that it is easy to roll back in case I am horribly wrong 😁
Change 636988 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] paws: Get haproxy logging working
Change 636988 merged by Bstorm:
[operations/puppet@production] paws: Get haproxy logging working
Ok, now paws-k8s-haproxy has logs:
Oct 28 18:57:31 paws-k8s-haproxy-1 haproxy[20121]: 127.0.0.1:40024 [28/Oct/2020:18:57:31.313] stats stats/<STATS> 0/0/0/0/0 200 3231 - - LR-- 71/1/0/0/0 0/0 "GET /stats;csv HTTP/1.1" Oct 28 18:57:31 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.12:14202 [28/Oct/2020:18:57:31.364] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-2.paws.eqiad1.wikimedia.cloud 345/2/476 814 -- 72/26/25/11/0 0/0 Oct 28 18:57:33 paws-k8s-haproxy-1 haproxy[20121]: 54.36.148.70:41954 [28/Oct/2020:18:57:33.293] k8s-ingress-http k8s-ingress-http/<NOSRV> 0/-1/-1/-1/0 302 161 - - LR-- 72/1/0/0/0 0/0 "GET /paws-public/42235255/scripts/i18n/interwiki/gu.json HTTP/1.1" Oct 28 18:57:39 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.3:39920 [28/Oct/2020:18:57:39.348] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-1.paws.eqiad1.wikimedia.cloud 43/2/119 834 -- 72/26/25/14/0 0/0 Oct 28 18:57:43 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.35:65344 [28/Oct/2020:18:57:43.156] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-2.paws.eqiad1.wikimedia.cloud 20/0/104 5106 -- 72/26/25/11/0 0/0 Oct 28 18:57:44 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.25:14454 [28/Oct/2020:18:57:44.718] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-1.paws.eqiad1.wikimedia.cloud 69/1/159 547 -- 72/26/25/14/0 0/0 Oct 28 18:57:46 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.15:32508 [28/Oct/2020:18:57:46.436] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-2.paws.eqiad1.wikimedia.cloud 3/2/23 493 -- 71/26/25/11/0 0/0 Oct 28 18:57:46 paws-k8s-haproxy-1 haproxy[20121]: 208.80.154.84:42060 [28/Oct/2020:18:57:46.895] k8s-ingress-http k8s-ingress-http/<NOSRV> 0/-1/-1/-1/0 302 135 - - LR-- 71/1/0/0/0 0/0 "GET /hub/login HTTP/1.1" Oct 28 18:57:47 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.43:61018 [28/Oct/2020:18:57:47.515] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-1.paws.eqiad1.wikimedia.cloud 76/3/238 8783 -- 71/26/25/14/0 0/0 Oct 28 18:57:55 paws-k8s-haproxy-1 haproxy[20121]: 208.80.153.74:39856 [28/Oct/2020:18:57:55.049] k8s-ingress-http k8s-ingress-http/<NOSRV> 0/-1/-1/-1/0 302 135 - - LR-- 71/1/0/0/0 0/0 "GET /hub/login HTTP/1.1" Oct 28 18:57:58 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.20:5364 [28/Oct/2020:18:57:58.066] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-2.paws.eqiad1.wikimedia.cloud 3/2/31 450 -- 71/26/25/11/0 0/0 Oct 28 18:58:04 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.36:6776 [28/Oct/2020:18:58:04.666] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-1.paws.eqiad1.wikimedia.cloud 2/2/26 381 -- 71/26/25/14/0 0/0 Oct 28 18:58:06 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.45:35720 [28/Oct/2020:18:58:06.210] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-2.paws.eqiad1.wikimedia.cloud 19/1/41 389 -- 71/26/25/11/0 0/0 Oct 28 18:58:12 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.1:19234 [28/Oct/2020:18:58:11.911] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-1.paws.eqiad1.wikimedia.cloud 116/2/162 433 -- 71/26/25/14/0 0/0 Oct 28 18:58:12 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.8:10920 [28/Oct/2020:18:58:12.447] k8s-ingress-http k8s-ingress-http/<NOSRV> 0/-1/-1/-1/0 302 180 - - LR-- 71/1/0/0/0 0/0 "GET /56661271/mediawiki/includes/collation/data/?C=S&O=A HTTP/1.1"
Also, I don't know if we really need to switch to http mode here, but I'll close this and put up the patch in case we want to investigate that change.
Put up the patch for http mode, which might be wrong anyway :)
I'm closing this ticket since this is currently doing the stated thing.
I remember when working on the Toolforge haproxy that the http mode offered little value and some undesirable overhead. I don't remember why I went that route too for PAWS, so I don't have a strong opinion today.