Page MenuHomePhabricator

Fabfur (Fabrizio Furnari)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
May 15 2023, 9:41 AM (58 w, 2 d)
Availability
Available
IRC Nick
fabfur
LDAP User
Fabfur
MediaWiki User
FFurnari-WMF [ Global Accounts ]

Recent Activity

Thu, Jun 20

Fabfur added a comment to T367756: Upgrade hosts to haproxy 2.8.10.

Following also https://github.com/haproxy/haproxy/issues/2612

Thu, Jun 20, 7:55 PM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Wed, Jun 19

Fabfur added a comment to T367756: Upgrade hosts to haproxy 2.8.10.

After upgrading HAProxy to 2.8.10 on whole ulsfo we still see some errors in the kafka DLQ like:

Wed, Jun 19, 3:46 PM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Fabfur created T367963: Investigate increase in CD termination state after upgrading eqsin/ulsfo to HAProxy 2.8.10.
Wed, Jun 19, 11:51 AM · Patch-For-Review, Data-Engineering, Traffic

Tue, Jun 18

Fabfur renamed T367756: Upgrade hosts to haproxy 2.8.10 from Upgrade ulsfo hosts to haproxy 2.8.10 to Upgrade hosts to haproxy 2.8.10.
Tue, Jun 18, 10:05 AM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Mon, Jun 17

Fabfur created T367756: Upgrade hosts to haproxy 2.8.10.
Mon, Jun 17, 1:29 PM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Thu, Jun 13

Fabfur added a comment to T360454: Better Benthos performances.

Update on Benthos performances.

Thu, Jun 13, 9:18 AM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Fri, Jun 7

Fabfur created T366891: HAProxy 3.0 production rollout.
Fri, Jun 7, 10:22 AM · Traffic
Fabfur created T366890: Create new haproxy30 component and import bullseye package.
Fri, Jun 7, 10:21 AM · Traffic
Fabfur created T366888: Check production configuration compatibility with HAProxy 3.0.
Fri, Jun 7, 10:20 AM · Traffic
Fabfur created T366887: Backport HAProxy 3.0 to Bullseye.
Fri, Jun 7, 10:18 AM · Traffic
Fabfur created T366885: Upgrade HAProxy to version 3 on cp hosts.
Fri, Jun 7, 10:17 AM · Patch-For-Review, Traffic

Wed, Jun 5

Fabfur updated the task description for T366466: Use IPIP encapsulation on lvs<-->text cluster.
Wed, Jun 5, 1:23 PM · Patch-For-Review, Traffic

Tue, Jun 4

Fabfur created T366606: HAProxy must start after network is really up.
Tue, Jun 4, 2:27 PM · Traffic

Thu, May 30

Fabfur added a comment to T365718: Switch HAProxy/Benthos to rfc5424.

This has been reverted due to a bug in HAProxy regarding how log variables are escaped in the rfc5424 format using the +E flag.

Thu, May 30, 9:56 PM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Tue, May 28

Fabfur closed T365566: HAProxy should not log information we don't actually need as Resolved.
Tue, May 28, 8:12 PM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Fabfur closed T365566: HAProxy should not log information we don't actually need, a subtask of T360454: Better Benthos performances, as Resolved.
Tue, May 28, 8:12 PM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Mon, May 27

Fabfur updated the task description for T366031: Upgrade Benthos package on cp hosts.
Mon, May 27, 8:20 PM · Data-Engineering, Observability-Logging, Traffic
Fabfur created T366031: Upgrade Benthos package on cp hosts.
Mon, May 27, 8:19 PM · Data-Engineering, Observability-Logging, Traffic

May 27 2024

Fabfur created T365968: Install benthos on single esams host to check performances under higher load.
May 27 2024, 8:50 AM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

May 23 2024

Fabfur created T365718: Switch HAProxy/Benthos to rfc5424.
May 23 2024, 2:23 PM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

May 22 2024

Fabfur created T365566: HAProxy should not log information we don't actually need.
May 22 2024, 8:53 AM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

May 21 2024

Fabfur updated subscribers of T365456: Move HTTP/1.0 requests rejections at HAProxy level.
May 21 2024, 10:58 AM · Patch-For-Review, Traffic
Fabfur created T365456: Move HTTP/1.0 requests rejections at HAProxy level.
May 21 2024, 10:43 AM · Patch-For-Review, Traffic
Fabfur added a comment to T365441: Umbrella task for Benthos parsing error.

A missing Host header in the request result in a 400 from Varnish and a parsing error from varnish:

May 21 2024, 8:38 AM · Data-Engineering, Observability-Logging, Traffic
Fabfur added a parent task for T365117: HAProxy log format doesn't support "invalid" request path: T365441: Umbrella task for Benthos parsing error.
May 21 2024, 8:36 AM · Patch-For-Review, Data Products, Data-Engineering, Observability-Logging, Traffic
Fabfur added a subtask for T365441: Umbrella task for Benthos parsing error: T365117: HAProxy log format doesn't support "invalid" request path.
May 21 2024, 8:36 AM · Data-Engineering, Observability-Logging, Traffic
Fabfur created T365441: Umbrella task for Benthos parsing error.
May 21 2024, 8:36 AM · Data-Engineering, Observability-Logging, Traffic
Fabfur updated the task description for T361845: Add metrics to Benthos.
May 21 2024, 8:26 AM · Observability-Logging, Traffic
Fabfur added a comment to T365117: HAProxy log format doesn't support "invalid" request path.

Update: opened this issue upstream to have another opinion about this and, in case, fix this behavior

May 21 2024, 8:08 AM · Patch-For-Review, Data Products, Data-Engineering, Observability-Logging, Traffic

May 16 2024

Fabfur added a comment to T365117: HAProxy log format doesn't support "invalid" request path.

Some other information about this:

May 16 2024, 6:10 PM · Patch-For-Review, Data Products, Data-Engineering, Observability-Logging, Traffic
Fabfur created T365117: HAProxy log format doesn't support "invalid" request path.
May 16 2024, 9:50 AM · Patch-For-Review, Data Products, Data-Engineering, Observability-Logging, Traffic

May 14 2024

Fabfur added a comment to T351117: Move analytics log from Varnish to HAProxy.

adopt topic names that follow EP conventions: <dc>.<topic_name>

I'm sorry for not thinking about this earlier. There is a big of a design flaw in the use of data center as a topic prefix, and really, for topics that are never mirrored to other Kafka clusters, there is no need for topic prefixes at all.

I just added documentation about this here:
https://wikitech.wikimedia.org/wiki/Kafka#Data_center_topic_prefixing_design_flaw

Given that, and the ever expanding list of data centers, and the fact that webrequest is the only stream we have that is produced to from non main data centers, I think we should not use topic prefixing for webrequest.

All producers should use the same topic name, independent of which data center they are in.

Thanks for clarifying @Ottomata.

@Ottomata @Fabfur If we remove prefixing, there is a potential clash between varnishkafka and benthos topics.
How about we name the production Haproxy/benthos topics as follows?

  • webrequest_frontent_text
  • webrequest_frontent_text.error
  • webrequest_frontent_upload
  • webrequest_frontent_upload.error
May 14 2024, 12:46 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

May 13 2024

Fabfur closed T359053: Reimage one of each Traffic hosts before magru as Resolved.

I'd say this could be closed...

May 13 2024, 7:01 AM · Traffic

May 10 2024

Fabfur closed T364543: support stdin/stdout/stderr in benthos unit file as Declined.
May 10 2024, 9:48 AM · observability, Traffic

May 9 2024

Fabfur created T364543: support stdin/stdout/stderr in benthos unit file.
May 9 2024, 1:32 PM · observability, Traffic
Fabfur closed T358107: Change mtail configuration to ignore new fields in HAProxy logs, a subtask of T351117: Move analytics log from Varnish to HAProxy, as Resolved.
May 9 2024, 9:03 AM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Fabfur closed T358107: Change mtail configuration to ignore new fields in HAProxy logs as Resolved.
May 9 2024, 9:03 AM · Patch-For-Review, Data Products, Data-Engineering, Observability-Logging, Traffic
Fabfur closed T360642: Remove extra fields currently sent to Kafka as Resolved.
May 9 2024, 9:02 AM · Event-Platform, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Fabfur closed T360642: Remove extra fields currently sent to Kafka, a subtask of T358109: Install new Benthos instance on cp hosts, as Resolved.
May 9 2024, 9:02 AM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Fabfur closed T363420: Use definitive field naming during grok parsing instead of renaming later (in mapping) as Resolved.
May 9 2024, 9:02 AM · Patch-For-Review, Observability-Logging, Traffic
Fabfur closed T363420: Use definitive field naming during grok parsing instead of renaming later (in mapping), a subtask of T358109: Install new Benthos instance on cp hosts, as Resolved.
May 9 2024, 9:01 AM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Fabfur added a comment to T364379: Benthos loses messages when under high load.

Little addendum: the latest Benthos release introduced the auto_replay_nacks configuration parameter in (all?) inputs: https://www.benthos.dev/docs/components/inputs/socket_server#auto_replay_nacks

May 9 2024, 8:55 AM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

May 8 2024

Fabfur updated subscribers of T364379: Benthos loses messages when under high load.

@CDanis helped me a lot in this direction and he found a workaround|solution for this specific issue, optimizing Benthos configuration and using an higher UDP receive buffer to mitigate very high concurrent requests.

May 8 2024, 9:42 PM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

May 7 2024

Fabfur created T364383: Update fifo_log_demux puppet module to support new parameters.
May 7 2024, 11:56 AM · Patch-For-Review, Traffic
Fabfur created T364379: Benthos loses messages when under high load.
May 7 2024, 10:45 AM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Apr 30 2024

Fabfur updated the task description for T362729: Q4:rack/setup/install cp70[01-16].
Apr 30 2024, 11:04 PM · Traffic, ops-magru, DC-Ops
Fabfur updated the task description for T362729: Q4:rack/setup/install cp70[01-16].
Apr 30 2024, 8:02 PM · Traffic, ops-magru, DC-Ops

Apr 26 2024

Fabfur updated subscribers of T351117: Move analytics log from Varnish to HAProxy.
  • there's couple of CRs pending (linked to this phab) and I'd like to have a second run on the event schema naming conventions (cc / @Fabfur). We might want to drop the webrequest_source since we don't currently use in ETL (it's inferred from the HDFS path, not schema).
Apr 26 2024, 10:05 AM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Apr 24 2024

Fabfur created T363420: Use definitive field naming during grok parsing instead of renaming later (in mapping).
Apr 24 2024, 10:46 PM · Patch-For-Review, Observability-Logging, Traffic

Apr 19 2024

Fabfur reopened T351117: Move analytics log from Varnish to HAProxy as "In Progress".

The haproxy_id field has been added to messages.

Apr 19 2024, 10:53 AM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Apr 18 2024

Fabfur created T362902: Add probenet configuration for magru.
Apr 18 2024, 3:29 PM · probenet, MW-1.43-notes (1.43.0-wmf.3; 2024-04-30), Patch-For-Review, netops, Infrastructure-Foundations, SRE
Fabfur added a comment to T351117: Move analytics log from Varnish to HAProxy.

About the sequence issue, that's the most plausible hypotheses. We could append (or prepend) other information pieces to the sequence number (like the haproxy process id) to avoid duplicates but we couldn't guarantee the monotonic increase (or the increase, even) in this case. I suggest using this current approach for the moment and eventually rework later.

Ack and +1 to your proposal. IMHO it's easier to be resilient to reloads, than working around non-monotonicity. Do you maybe a feel for how often haproxy reloads are expected to happen once it prod? Could we assume they are sporadic events?

Apr 18 2024, 1:47 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Fabfur added a comment to T351117: Move analytics log from Varnish to HAProxy.

I agree with @gmodena on all topics, more specifically:

Apr 18 2024, 12:34 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Fabfur added a comment to T350179: Reimage cookbook on new eqiad hosts stuck at PXE booting.

I'd like to join the chorus of thanks to Papaul, you resolved us a very nasty and long running issue here! Thanks again!

Apr 18 2024, 8:12 AM · SRE, Traffic, SRE-swift-storage, ops-codfw, DC-Ops, ops-eqiad

Apr 17 2024

BCornwall awarded T350179: Reimage cookbook on new eqiad hosts stuck at PXE booting a 100 token.
Apr 17 2024, 6:30 AM · SRE, Traffic, SRE-swift-storage, ops-codfw, DC-Ops, ops-eqiad

Apr 11 2024

Fabfur closed T360430: esams text cp nvme upgrade as Resolved.
Apr 11 2024, 2:46 PM · SRE, Traffic, ops-esams, DC-Ops
Fabfur updated the task description for T360430: esams text cp nvme upgrade.
Apr 11 2024, 9:38 AM · SRE, Traffic, ops-esams, DC-Ops

Apr 10 2024

Fabfur updated the task description for T360430: esams text cp nvme upgrade.
Apr 10 2024, 8:58 AM · SRE, Traffic, ops-esams, DC-Ops

Apr 9 2024

Fabfur added a comment to P60144 Bash function: open puppet repo path in Gerrit web UI.

on Linux I usually use xdg-open

Apr 9 2024, 8:27 PM · SRE

Apr 4 2024

Fabfur updated the task description for T360430: esams text cp nvme upgrade.
Apr 4 2024, 5:11 PM · SRE, Traffic, ops-esams, DC-Ops
Fabfur created T361845: Add metrics to Benthos.
Apr 4 2024, 2:36 PM · Observability-Logging, Traffic

Apr 3 2024

Fabfur updated the task description for T360430: esams text cp nvme upgrade.
Apr 3 2024, 9:24 AM · SRE, Traffic, ops-esams, DC-Ops

Apr 2 2024

Fabfur updated the task description for T360430: esams text cp nvme upgrade.
Apr 2 2024, 2:42 PM · SRE, Traffic, ops-esams, DC-Ops
Fabfur added a comment to T360430: esams text cp nvme upgrade.

cp3066 has been reimaged successfully, no evidence of errors

Apr 2 2024, 2:42 PM · SRE, Traffic, ops-esams, DC-Ops

Mar 27 2024

Fabfur updated the task description for T360430: esams text cp nvme upgrade.
Mar 27 2024, 2:56 PM · SRE, Traffic, ops-esams, DC-Ops
Fabfur added a comment to T360430: esams text cp nvme upgrade.

esams has been repooled at 12:15UTC

Mar 27 2024, 2:55 PM · SRE, Traffic, ops-esams, DC-Ops
Fabfur updated the task description for T360430: esams text cp nvme upgrade.
Mar 27 2024, 12:12 PM · SRE, Traffic, ops-esams, DC-Ops
Fabfur updated the task description for T360430: esams text cp nvme upgrade.
Mar 27 2024, 12:00 PM · SRE, Traffic, ops-esams, DC-Ops
Fabfur updated the task description for T360430: esams text cp nvme upgrade.
Mar 27 2024, 11:50 AM · SRE, Traffic, ops-esams, DC-Ops
Fabfur updated the task description for T360430: esams text cp nvme upgrade.
Mar 27 2024, 10:02 AM · SRE, Traffic, ops-esams, DC-Ops
Fabfur updated the task description for T360430: esams text cp nvme upgrade.
Mar 27 2024, 7:24 AM · SRE, Traffic, ops-esams, DC-Ops
Fabfur added a comment to T360430: esams text cp nvme upgrade.

ESAMS DC started depooling @05:58UTC

Mar 27 2024, 6:00 AM · SRE, Traffic, ops-esams, DC-Ops

Mar 25 2024

Fabfur updated the task description for T358109: Install new Benthos instance on cp hosts.
Mar 25 2024, 11:34 AM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Fabfur added a comment to T360642: Remove extra fields currently sent to Kafka.

meta.id and meta.request_id

meta.id is used to uniquely identify an event, and it is usually used for deduplication.

Mar 25 2024, 10:05 AM · Event-Platform, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Mar 22 2024

Fabfur closed T352098: Requesting access to "researchers" and "analytics-privatedata-users" for Xiao Xiao as Resolved.
Mar 22 2024, 4:05 PM · Patch-For-Review, SRE, SRE-Access-Requests
Fabfur closed T360639: Grant Access to ldap/WMF for zoe as Resolved.

User added to the ldap/wmf group, please let me know if you can use these services.

Mar 22 2024, 2:40 PM · Patch-For-Review, SRE, LDAP-Access-Requests
Fabfur closed T360367: Grant Access to ldap/wmf for Katie Coleman as Resolved.
Mar 22 2024, 2:39 PM · Patch-For-Review, SRE, LDAP-Access-Requests
Fabfur added a comment to T360367: Grant Access to ldap/wmf for Katie Coleman.

User added to the ldap/wmf group, please let me know if you can use these services.

Mar 22 2024, 2:36 PM · Patch-For-Review, SRE, LDAP-Access-Requests
Fabfur added a member for WMF-NDA: KColeman-WMF.
Mar 22 2024, 2:32 PM
Fabfur added a comment to T360367: Grant Access to ldap/wmf for Katie Coleman.

Thanks, I'll notice you soon with the confirmation!

Mar 22 2024, 2:19 PM · Patch-For-Review, SRE, LDAP-Access-Requests
Fabfur added a member for WMF-NDA: zoe.
Mar 22 2024, 1:20 PM
Fabfur claimed T360639: Grant Access to ldap/WMF for zoe.
Mar 22 2024, 1:10 PM · Patch-For-Review, SRE, LDAP-Access-Requests
Fabfur claimed T360367: Grant Access to ldap/wmf for Katie Coleman.
Mar 22 2024, 1:09 PM · Patch-For-Review, SRE, LDAP-Access-Requests
Fabfur added a comment to T360367: Grant Access to ldap/wmf for Katie Coleman.

Hello, thanks for this request, could your direct manager please confirm this (it's sufficient to respond to this ticket).

Mar 22 2024, 1:09 PM · Patch-For-Review, SRE, LDAP-Access-Requests
Fabfur created T360766: Return 403 to non HEAD|GET requests in HAProxy tls frontend.
Mar 22 2024, 11:52 AM · Traffic
Fabfur added a comment to T360642: Remove extra fields currently sent to Kafka.

These are the fields that are sent from Benthos that aren't present in the current webrequest stream:

FWIW meta and $schema are not part of webrequest, but a requirements for EP integration: https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas#Required_event_data. Usually they are managed by our node/java producers.

We don't need for sure two uuid (generate by different parts of the processing pipeline), that are expensive to generate under heavy load and could result in a potential waste of bandwidth/space on Kafka.

I need to investigate the historical reason behind both meta.id and meta.request_id, but if performance is a concern I think we can live without meta.id. meta itself is a required field in the webrequest schema, but payload should validate with missing/empty id.

kafka-jumbo and hadoop should be fine (storage wise), but I do appreciate that at webrequest scale every byte sent over the wire counts (and adds up quickly).
For my own education, do you have some datapoints that show how much overhead uuidv4 in benthos produces (cpu-wise)?

Mar 22 2024, 8:56 AM · Event-Platform, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Mar 21 2024

Fabfur created T360642: Remove extra fields currently sent to Kafka.
Mar 21 2024, 2:53 PM · Event-Platform, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Mar 20 2024

Fabfur closed T360462: Access to rua-dmarc@wikimedia.org as Resolved.

Sent information about dmarc address privately (mail) to the ticket author

Mar 20 2024, 12:01 PM · SRE, SRE-Access-Requests, Fundraising-Backlog
Fabfur claimed T360462: Access to rua-dmarc@wikimedia.org.
Mar 20 2024, 11:19 AM · SRE, SRE-Access-Requests, Fundraising-Backlog

Mar 19 2024

Fabfur closed T359627: Benthos: better management for unparsable logs, a subtask of T351117: Move analytics log from Varnish to HAProxy, as Resolved.
Mar 19 2024, 11:26 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Fabfur closed T359627: Benthos: better management for unparsable logs as Resolved.
Mar 19 2024, 11:26 PM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Fabfur closed T360415: Review haproxy captured header lengths, a subtask of T351117: Move analytics log from Varnish to HAProxy, as Resolved.
Mar 19 2024, 11:26 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Fabfur closed T360415: Review haproxy captured header lengths as Resolved.
Mar 19 2024, 11:26 PM · Traffic
Fabfur created T360454: Better Benthos performances.
Mar 19 2024, 5:09 PM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Fabfur closed T360450: Add $schema key to Benthos payload as Resolved.
Mar 19 2024, 5:07 PM · Event-Platform, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Fabfur closed T360450: Add $schema key to Benthos payload, a subtask of T358109: Install new Benthos instance on cp hosts, as Resolved.
Mar 19 2024, 5:06 PM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Fabfur created T360450: Add $schema key to Benthos payload.
Mar 19 2024, 3:50 PM · Event-Platform, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Fabfur triaged T359627: Benthos: better management for unparsable logs as Low priority.

Even without metrics generation, this has been fixed with a small processing on the input side.
Leave the ticket in progress to add code for metric generation.

Mar 19 2024, 10:58 AM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Fabfur changed the status of T358109: Install new Benthos instance on cp hosts from Open to In Progress.
Mar 19 2024, 10:56 AM · Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Fabfur changed the status of T358109: Install new Benthos instance on cp hosts, a subtask of T351117: Move analytics log from Varnish to HAProxy, from Open to In Progress.
Mar 19 2024, 10:55 AM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic