Page MenuHomePhabricator

Enable error logging on the haproxy layer for kubernetes
Closed, ResolvedPublic

Description

There is no reason we don't have logging enabled for errors. We just have to add that to the config in puppet, and that would help to understand what layer issues are happening at. This should include a logrotate setup.

Event Timeline

Bstorm triaged this task as High priority.Oct 27 2020, 6:51 PM
Bstorm created this task.
bd808 awarded a token.Oct 27 2020, 7:03 PM

@aborrero I noticed that we are using tcp mode at the haproxy layer. That's because TLS termination is happening at the front proxy? At this point, I'm trying to remember.

Change 636765 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] k8s-haproxy: take steps to fix logging

https://gerrit.wikimedia.org/r/636765

@aborrero I noticed that we are using tcp mode at the haproxy layer. That's because TLS termination is happening at the front proxy? At this point, I'm trying to remember.

Yes, in Toolforge TLS termination happens in the front proxy VM.

More info here: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Networking_and_ingress

Please note that in PAWS is a bit different: TLS termination happens in haproxy itself (there is no front proxy there).

Change 636765 merged by Bstorm:
[operations/puppet@production] k8s-haproxy: take steps to fix logging

https://gerrit.wikimedia.org/r/636765

Change 636975 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] k8s-haproxy: Fix a typo in the logrotate config

https://gerrit.wikimedia.org/r/636975

Change 636975 merged by Bstorm:
[operations/puppet@production] k8s-haproxy: Fix a typo in the logrotate config

https://gerrit.wikimedia.org/r/636975

Bstorm closed this task as Resolved.Oct 28 2020, 4:45 PM
Bstorm claimed this task.

We've got logs! You have to check the active haproxy instance to get them, and they are TCP logs (so not nearly as useful as HTTP logs) because outside of PAWS, TLS termination is not at the haproxy layer.

Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:58860 [28/Oct/2020:16:43:33.605] k8s-ingress k8s-ingress/tools-k8s-ingress-1.tools.eqiad1.wikimedia.cloud 1/10/20022 166 -- 546/405/404/197/0 0/0
Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:35228 [28/Oct/2020:16:43:53.442] k8s-ingress k8s-ingress/tools-k8s-ingress-2.tools.eqiad1.wikimedia.cloud 1/1/198 14127 -- 545/404/403/206/0 0/0
Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:35232 [28/Oct/2020:16:43:53.449] k8s-ingress k8s-ingress/tools-k8s-ingress-1.tools.eqiad1.wikimedia.cloud 1/0/200 14127 -- 544/403/402/196/0 0/0
Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:35260 [28/Oct/2020:16:43:53.537] k8s-ingress k8s-ingress/tools-k8s-ingress-2.tools.eqiad1.wikimedia.cloud 1/0/150 1314 -- 544/403/402/205/0 0/0
Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:35272 [28/Oct/2020:16:43:53.666] k8s-ingress k8s-ingress/tools-k8s-ingress-1.tools.eqiad1.wikimedia.cloud 1/1/47 420 -- 544/403/402/196/0 0/0
Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:35280 [28/Oct/2020:16:43:53.742] k8s-ingress k8s-ingress/tools-k8s-ingress-1.tools.eqiad1.wikimedia.cloud 1/1/4 477 -- 546/405/404/197/0 0/0
Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:34154 [28/Oct/2020:16:43:47.027] k8s-ingress k8s-ingress/tools-k8s-ingress-1.tools.eqiad1.wikimedia.cloud 1/1/6723 2430 -- 545/404/403/196/0 0/0
Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:35264 [28/Oct/2020:16:43:53.555] k8s-ingress k8s-ingress/tools-k8s-ingress-1.tools.eqiad1.wikimedia.cloud 1/0/236 435 -- 545/404/403/195/0 0/0
Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:35292 [28/Oct/2020:16:43:53.789] k8s-ingress k8s-ingress/tools-k8s-ingress-2.tools.eqiad1.wikimedia.cloud 1/1/4 477 -- 544/403/402/207/0 0/0
Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:35276 [28/Oct/2020:16:43:53.698] k8s-ingress k8s-ingress/tools-k8s-ingress-2.tools.eqiad1.wikimedia.cloud 1/1/113 938 -- 543/402/401/206/0 0/0
Oct 28 16:43:53 tools-k8s-haproxy-2 haproxy[587]: 172.16.0.17:35236 [28/Oct/2020:16:43:53.459] k8s-ingress k8s-ingress/tools-k8s-ingress-2.tools.eqiad1.wikimedia.cloud 1/2/359 631 -- 542/401/400/205/0 0/0

At very least, we can see the traffic pass through.

Bstorm reopened this task as Open.Oct 28 2020, 4:46 PM

Actually, re-opening and copying this over to the PAWS setup.

@aborrero Looking at the PAWS ingress stuff, I think we can/should put that setup into http mode since it is the TLS terminus. Then we can log the HTTP requests there as well. I could make a totally separate patch for that so that it is easy to roll back in case I am horribly wrong 😁

Change 636988 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] paws: Get haproxy logging working

https://gerrit.wikimedia.org/r/636988

Change 636988 merged by Bstorm:
[operations/puppet@production] paws: Get haproxy logging working

https://gerrit.wikimedia.org/r/636988

Ok, now paws-k8s-haproxy has logs:

Oct 28 18:57:31 paws-k8s-haproxy-1 haproxy[20121]: 127.0.0.1:40024 [28/Oct/2020:18:57:31.313] stats stats/<STATS> 0/0/0/0/0 200 3231 - - LR-- 71/1/0/0/0 0/0 "GET /stats;csv HTTP/1.1"
Oct 28 18:57:31 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.12:14202 [28/Oct/2020:18:57:31.364] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-2.paws.eqiad1.wikimedia.cloud 345/2/476 814 -- 72/26/25/11/0 0/0
Oct 28 18:57:33 paws-k8s-haproxy-1 haproxy[20121]: 54.36.148.70:41954 [28/Oct/2020:18:57:33.293] k8s-ingress-http k8s-ingress-http/<NOSRV> 0/-1/-1/-1/0 302 161 - - LR-- 72/1/0/0/0 0/0 "GET /paws-public/42235255/scripts/i18n/interwiki/gu.json HTTP/1.1"
Oct 28 18:57:39 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.3:39920 [28/Oct/2020:18:57:39.348] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-1.paws.eqiad1.wikimedia.cloud 43/2/119 834 -- 72/26/25/14/0 0/0
Oct 28 18:57:43 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.35:65344 [28/Oct/2020:18:57:43.156] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-2.paws.eqiad1.wikimedia.cloud 20/0/104 5106 -- 72/26/25/11/0 0/0
Oct 28 18:57:44 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.25:14454 [28/Oct/2020:18:57:44.718] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-1.paws.eqiad1.wikimedia.cloud 69/1/159 547 -- 72/26/25/14/0 0/0
Oct 28 18:57:46 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.15:32508 [28/Oct/2020:18:57:46.436] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-2.paws.eqiad1.wikimedia.cloud 3/2/23 493 -- 71/26/25/11/0 0/0
Oct 28 18:57:46 paws-k8s-haproxy-1 haproxy[20121]: 208.80.154.84:42060 [28/Oct/2020:18:57:46.895] k8s-ingress-http k8s-ingress-http/<NOSRV> 0/-1/-1/-1/0 302 135 - - LR-- 71/1/0/0/0 0/0 "GET /hub/login HTTP/1.1"
Oct 28 18:57:47 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.43:61018 [28/Oct/2020:18:57:47.515] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-1.paws.eqiad1.wikimedia.cloud 76/3/238 8783 -- 71/26/25/14/0 0/0
Oct 28 18:57:55 paws-k8s-haproxy-1 haproxy[20121]: 208.80.153.74:39856 [28/Oct/2020:18:57:55.049] k8s-ingress-http k8s-ingress-http/<NOSRV> 0/-1/-1/-1/0 302 135 - - LR-- 71/1/0/0/0 0/0 "GET /hub/login HTTP/1.1"
Oct 28 18:57:58 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.20:5364 [28/Oct/2020:18:57:58.066] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-2.paws.eqiad1.wikimedia.cloud 3/2/31 450 -- 71/26/25/11/0 0/0
Oct 28 18:58:04 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.36:6776 [28/Oct/2020:18:58:04.666] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-1.paws.eqiad1.wikimedia.cloud 2/2/26 381 -- 71/26/25/14/0 0/0
Oct 28 18:58:06 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.45:35720 [28/Oct/2020:18:58:06.210] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-2.paws.eqiad1.wikimedia.cloud 19/1/41 389 -- 71/26/25/11/0 0/0
Oct 28 18:58:12 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.1:19234 [28/Oct/2020:18:58:11.911] k8s-ingress-https~ k8s-ingress/paws-k8s-ingress-1.paws.eqiad1.wikimedia.cloud 116/2/162 433 -- 71/26/25/14/0 0/0
Oct 28 18:58:12 paws-k8s-haproxy-1 haproxy[20121]: 185.191.171.8:10920 [28/Oct/2020:18:58:12.447] k8s-ingress-http k8s-ingress-http/<NOSRV> 0/-1/-1/-1/0 302 180 - - LR-- 71/1/0/0/0 0/0 "GET /56661271/mediawiki/includes/collation/data/?C=S&O=A HTTP/1.1"

Also, I don't know if we really need to switch to http mode here, but I'll close this and put up the patch in case we want to investigate that change.

Bstorm closed this task as Resolved.Oct 28 2020, 7:19 PM

Put up the patch for http mode, which might be wrong anyway :)
I'm closing this ticket since this is currently doing the stated thing.

@aborrero Looking at the PAWS ingress stuff, I think we can/should put that setup into http mode since it is the TLS terminus. Then we can log the HTTP requests there as well. I could make a totally separate patch for that so that it is easy to roll back in case I am horribly wrong 😁

I remember when working on the Toolforge haproxy that the http mode offered little value and some undesirable overhead. I don't remember why I went that route too for PAWS, so I don't have a strong opinion today.