As the rfc5424 bug is fixed in 2.8.10 and it's already imported into our apt repositories, it's time to switch the cp hosts to this HAProxy version and try again with rfc5424 format
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
In Progress | Fabfur | T351117 Move analytics log from Varnish to HAProxy | |||
Declined | Fabfur | T358109 Install new Benthos instance on cp hosts | |||
Declined | Fabfur | T360454 Better Benthos performances | |||
Resolved | Fabfur | T365718 Switch HAProxy/Benthos to rfc5424 | |||
Resolved | Fabfur | T367756 Upgrade hosts to haproxy 2.8.10 | |||
Resolved | Vgutierrez | T367963 Investigate increase in CD termination state after upgrading eqsin/ulsfo to HAProxy 2.8.10 |
Event Timeline
Change #1046674 had a related patch set uploaded (by Fabfur; author: Fabfur):
[operations/puppet@production] hiera: install haproxy 2.8.10 on cp4037
Change #1046674 merged by Fabfur:
[operations/puppet@production] hiera: install haproxy 2.8.10 on cp4037
Mentioned in SAL (#wikimedia-operations) [2024-06-17T15:28:36Z] <fabfur> upgrading haproxy to 2.8.10 on cp4037 (T367756)
Mentioned in SAL (#wikimedia-operations) [2024-06-18T08:43:05Z] <fabfur> cp4037 currently depooled and puppet disabled for T367756
Mentioned in SAL (#wikimedia-operations) [2024-06-18T10:05:19Z] <fabfur> cp3066 currently depooled and puppet disabled for T367756
Change #1047029 had a related patch set uploaded (by Fabfur; author: Fabfur):
[operations/puppet@production] hiera: enable benthos on cp3066
Mentioned in SAL (#wikimedia-operations) [2024-06-18T10:58:14Z] <fabfur> cp3066 repooled and puppet enabled (T367756)
Change #1047039 had a related patch set uploaded (by Fabfur; author: Fabfur):
[operations/puppet@production] hiera: upgrade haproxy to 2.8 on ulsfo
Change #1047039 merged by Fabfur:
[operations/puppet@production] hiera: upgrade haproxy to 2.8 on ulsfo
Mentioned in SAL (#wikimedia-operations) [2024-06-18T12:47:24Z] <fabfur> upgrade haproxy to v2.8.10 on all ulsfo cp hosts (T367756)
Mentioned in SAL (#wikimedia-operations) [2024-06-18T15:36:08Z] <fabfur> upgrade haproxy to v2.8.10 on cp3066 (T367756)
Mentioned in SAL (#wikimedia-operations) [2024-06-18T15:39:28Z] <fabfur> upgrade haproxy to v2.8.10 on cp5030,cp5032 (T367756)
Change #1047436 had a related patch set uploaded (by Fabfur; author: Fabfur):
[operations/puppet@production] hiera: upgrade haproxy to 2.8 on eqsin
Change #1047442 had a related patch set uploaded (by Fabfur; author: Fabfur):
[operations/puppet@production] benthos:cache: switch to rfc5424 format
Change #1047436 merged by Fabfur:
[operations/puppet@production] hiera: upgrade haproxy to 2.8 on eqsin
Mentioned in SAL (#wikimedia-operations) [2024-06-19T08:52:14Z] <fabfur> upgrading eqsin cp hosts to haproxy 2.8.10 (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047436) (T367756)
Change #1047483 had a related patch set uploaded (by Fabfur; author: Fabfur):
[operations/puppet@production] hiera: test downgrading haproxy on cp5017
Change #1047483 merged by Fabfur:
[operations/puppet@production] hiera: test downgrading haproxy on cp5017
Change #1047442 merged by Fabfur:
[operations/puppet@production] benthos:cache: switch to rfc5424 format
Change #1047536 had a related patch set uploaded (by Fabfur; author: Fabfur):
[operations/puppet@production] benthos:cache: fixed typo in field name
Change #1047536 merged by Fabfur:
[operations/puppet@production] benthos:cache: fixed typo in field name
Change #1047545 had a related patch set uploaded (by Fabfur; author: Fabfur):
[operations/puppet@production] benthos:cache: delete message if sequence number is missing
Change #1047545 merged by Fabfur:
[operations/puppet@production] benthos:cache: delete message if sequence number is missing
After upgrading HAProxy to 2.8.10 on whole ulsfo we still see some errors in the kafka DLQ like:
<134>1 2024-06-19T15:42:31.440224+00:00 - haproxy 1227962 - [benthoslog haproxy_pid="1227962" ip="<REDACTED>" sequence="1582219" accept_date="19/Jun/2024:15:42:30.993" time_backend_response="287" http_status="200" response_size="301073" termination_state="--" uri_host="be-tarask.wikipedia.org" referer="https://be-tarask.wikipedia.org/wiki/%D0%92%D1%96%D0%BA%D1%96%D0%BF%D1%8D%D0%B4%D1%8B%D1%8F:%D0%9F%D1%80%D0%B0%D0%B5%D0%BA%D1%82:%D0%A2%D1%8D%D0%BC%D0%B0%D1%82%D1%8B%D1%87%D0%BD%D1%8B_%D1%82%D1%8B%D0%B4%D0%B7%D0%B5%D0%BD%D1%8C/%D0%90%D1%80%D1%88%D0%B0%D0%BD%D1%81%D0%BA%D0%B0%D1%8F_%D0%B1%D1%96%D1%82%D0%B2%D0%B0" user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)" accept_language="en-US" range="-" accept="*/*" tls="vers=TLSv1.3;keyx=UNKNOWN;auth=ECDSA;ciph=AES-256-GCM-SHA384;prot=h1;sess=new" cache_status="miss" content_type="text/javascript; charset=" x_analytics="WMF-Last-Access=19-Jun-2024;WMF-Last-Access-Global=19-Jun-2024;https=1;client_port=35204" x_cache="cp4044 miss, cp4044 miss" backend="mw-web.codfw.main-7666f67487-nw9fq" http_method="GET" uri_path="/w/load.php" uri_query="?lang=be-tarask&modules=ext.categoryTree%2CeventLogging%2CnavigationTiming%2Cpopups%2CwikimediaEvents%7Cext.centralNotice.choiceData%2Cdisplay%2CgeoIP%2CimpressionDiet%2CkvStore%2CstartUp%7Cext.centralauth.centralautologin%7Cext.checkUser.clientHints%7Cext.cx.eventlogging.campaigns%7Cext.discussionTools.init%2Cminervaicons%7Cext.echo.centralauth%7Cext.uls.interface%2Cpreferences%2Cwebfonts%7Cext.urlShortener.toolbar%7Cjquery%2Cmoment%2Coojs%2Coojs-ui-core%2Coojs-ui-windows%2Crangefix%2Csite%7Cjquery.client%2CtextSelection%7Cmediawiki.String%2CTitle%2CUri%2Capi%2Cbase%2Ccldr%2Ccookie%2Cexperiments%2CjqueryMsg%2Clanguage%2Crouter%2Cstorage%2Ctoc%2Cuser%2Cutil%2CvisibleTimeout%7Cmediawiki.editfont.styles%7Cmediawiki.libs.pluralruleparser%7Cmediawiki.page.media%2Cready%7Cmediawiki.page.watch.ajax%7Cmmv.bootstrap%2Ccodex%2Chead%7Cmmv.bootstrap.autostart%7Coojs-ui-windows.icons%7Cskins.vector.clientPreferences%2Cjs%7Cskins.vector.icons.js&skin=vector-2022&versio<134>1 2024-06-19T15:42:31.441664+00:00 - haproxy 1227962 - [benthoslog haproxy_pid="1227962" ip="<REDACTED>" sequence="1582448" accept_date="19/Jun/2024:15:42:31.440" time_backend_response="1" http_status="200" response_size="17243" termination_state="--" uri_host="en.m.wikipedia.org" referer="https://en.m.wikipedia.org/wiki/13th_Amendment_to_the_United_States_Constitution" user_agent="Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Mobile Safari/537.36" accept_language="en-US,en;q=0.9" range="-" accept="text/css,*/*;q=0.1" tls="vers=TLSv1.3;keyx=UNKNOWN;auth=ECDSA;ciph=AES-256-GCM-SHA384;prot=h2;sess=new" cache_status="hit-front" content_type="text/css; charset=utf-8" x_analytics="WMF-Last-Access=19-Jun-2024;WMF-Last-Access-Global=19-Jun-2024;https=1;client_port=39466" x_cache="cp4044 hit, cp4044 hit/1766" backend="ATS/9.1.4" http_method="GET" uri_path="/w/load.php" uri_query="?lang=en&modules=ext.cite.styles%7Cext.relatedArticles.styles%7Cext.wikimediaBadges%7Cext.wikimediamessages.styles%7Cmediawiki.hlist%7Cmobile.init.styles%7Cskins.minerva.amc.styles%7Cskins.minerva.base.styles%7Cskins.minerva.codex.styles%7Cskins.minerva.content.styles.images%7Cskins.minerva.icons.wikimedia%7Cskins.minerva.mainMenu.icons%2Cstyles%7Cwikibase.client.init&only=styles&skin=minerva"] 1582448 1 0 0 200 {en.m.wikipedia.org} {hit-front} -
Change #1049104 had a related patch set uploaded (by Fabfur; author: Fabfur):
[operations/puppet@production] hiera: upgrade haproxy to 2.8 on eqsin
Mentioned in SAL (#wikimedia-operations) [2024-06-25T13:07:26Z] <fabfur> temporary disabled puppet on cp4037 to test benthos configuration (T367756)
Update on this investigation: apparently capturing the frame with tcpdump from HAProxy to Benthos, doesn't show the "log merging". The whole frame(s) data looks correctly encoded so it's probably something wrong on the Benthos side (parsing)
Change #1049104 merged by Fabfur:
[operations/puppet@production] hiera: upgrade haproxy to 2.8 on drmrs
Mentioned in SAL (#wikimedia-operations) [2024-07-01T10:23:38Z] <fabfur> upgrading A:cp-drmrs to haproxy 2.8.10 (T367756)
Change #1051110 had a related patch set uploaded (by Fabfur; author: Fabfur):
[operations/puppet@production] hiera: upgrade haproxy to 2.8 on magru
Change #1051130 had a related patch set uploaded (by Fabfur; author: Fabfur):
[operations/puppet@production] hiera: removed unused haproxy28 overrides
Change #1051110 merged by Fabfur:
[operations/puppet@production] hiera: upgrade haproxy to 2.8 on magru
Mentioned in SAL (#wikimedia-operations) [2024-07-01T12:49:18Z] <fabfur> upgrading A:cp-magru to haproxy 2.8.10 (T367756)
Change #1051130 merged by Fabfur:
[operations/puppet@production] hiera: removed unused haproxy28 overrides
Change #1051143 had a related patch set uploaded (by Fabfur; author: Fabfur):
[operations/puppet@production] hiera: upgrade haproxy to 2.8 on codfw
Mentioned in SAL (#wikimedia-operations) [2024-07-01T14:35:40Z] <fabfur> upgrading A:cp-codfw to haproxy 2.8.10 (T367756)
Change #1051143 merged by Fabfur:
[operations/puppet@production] hiera: upgrade haproxy to 2.8 on codfw
Change #1051292 had a related patch set uploaded (by Fabfur; author: Fabfur):
[operations/puppet@production] hiera: upgrade haproxy to 2.8 on eqiad
Change #1051292 merged by Fabfur:
[operations/puppet@production] hiera: upgrade haproxy to 2.8 on eqiad
Mentioned in SAL (#wikimedia-operations) [2024-07-02T10:28:30Z] <fabfur> upgrading A:cp-eqiad to haproxy 2.8.10 (T367756)
Change #1051348 had a related patch set uploaded (by Fabfur; author: Fabfur):
[operations/puppet@production] hiera: upgrade haproxy to 2.8 on esams
Change #1051348 merged by Fabfur:
[operations/puppet@production] hiera: upgrade haproxy to 2.8 on esams
Mentioned in SAL (#wikimedia-operations) [2024-07-02T14:55:46Z] <fabfur> upgrading A:cp-esams to haproxy 2.8.10 (T367756)
Change #1051394 had a related patch set uploaded (by Fabfur; author: Fabfur):
[operations/puppet@production] hiera: consolidate haproxy version to 2.8
Change #1051394 merged by Fabfur:
[operations/puppet@production] hiera: consolidate haproxy version to 2.8
Change #1047029 abandoned by Fabfur:
[operations/puppet@production] hiera: enable benthos on cp3066
Reason:
See T370741