Page MenuHomePhabricator

JAllemandou (joal)
Data Engineer

Today

  • No visible events.

Tomorrow

  • No visible events.

Monday

  • No visible events.

User Details

User Since
Feb 11 2015, 6:02 PM (582 w, 2 d)
Availability
Available
IRC Nick
joal
LDAP User
Unknown
MediaWiki User
JAllemandou (WMF) [ Global Accounts ]

Recent Activity

Yesterday

JAllemandou closed T420434: Analyze SQL queries generating metrics, a subtask of T418032: Weekly delivery cadence of core contributor metrics, as Resolved.
Fri, Apr 10, 8:08 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
JAllemandou closed T420434: Analyze SQL queries generating metrics as Resolved.

I've written a plan for Incremental-Mediawiki-History here: https://docs.google.com/document/d/1QZNCZhsBCxEKwogI8S1GFtELTPa0t9DYUBFoc3jI-oo/edit?tab=t.0
Calling this done.

Fri, Apr 10, 8:08 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou moved T420434: Analyze SQL queries generating metrics from In progress to Done on the Data-Engineering (Q4 FS25/26 April 1st - June 30st) board.
Fri, Apr 10, 8:07 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou placed T422030: Surge in webrequest validation check up for grabs.
Fri, Apr 10, 8:07 AM · Patch-For-Review, Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic
JAllemandou added a comment to T422030: Surge in webrequest validation check.

Removing myself as the task assignee so that someone else take it while I'm on holidays.

Fri, Apr 10, 8:07 AM · Patch-For-Review, Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic
JAllemandou moved T422030: Surge in webrequest validation check from Next Up to In progress on the Data-Engineering (Q4 FS25/26 April 1st - June 30st) board.
Fri, Apr 10, 8:06 AM · Patch-For-Review, Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic
JAllemandou closed T422033: Investigate raise in Invalid HAProxyKafka messages in esams as Resolved.

Having discussed this with Traffic, this was related to SSL handshake problem (regular traffic), that was incorrectly logged by HAProxy in v3.0. The logging is now fixed with v3.2. Calling this done.

Fri, Apr 10, 8:05 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic
JAllemandou moved T422033: Investigate raise in Invalid HAProxyKafka messages in esams from Next Up to Done on the Data-Engineering (Q4 FS25/26 April 1st - June 30st) board.
Fri, Apr 10, 8:04 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic
JAllemandou moved T422030: Surge in webrequest validation check from In Progress to Blocked/Waiting on the Data-Platform-SRE (2026-03-27 - 2026-04-17) board.
Fri, Apr 10, 8:03 AM · Patch-For-Review, Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic
JAllemandou added a comment to T418032: Weekly delivery cadence of core contributor metrics.

Here's my proposed plan for an Incremental-Mediawiki-History: https://docs.google.com/document/d/1QZNCZhsBCxEKwogI8S1GFtELTPa0t9DYUBFoc3jI-oo/edit?tab=t.0
I know the team will discuss this next week in Dublin :)

Fri, Apr 10, 8:03 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Thu, Apr 9

JAllemandou moved T422030: Surge in webrequest validation check from Backlog - project to In Progress on the Data-Platform-SRE (2026-03-27 - 2026-04-17) board.
Thu, Apr 9, 8:04 AM · Patch-For-Review, Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic
JAllemandou added a project to T422030: Surge in webrequest validation check: Data-Platform-SRE (2026-03-27 - 2026-04-17).
Thu, Apr 9, 8:04 AM · Patch-For-Review, Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic

Wed, Apr 8

JAllemandou added a comment to T422030: Surge in webrequest validation check.

Summarizing here a talk we had on slack with @Vgutierrez and @Fabfur :

  • In v3.0 we were experiencing unexpected sequence-id increment. This is fixed with v3.2 as of today.
  • We were not-logging a lot of lines seen as invalid-messages in HAProxyKafka, and this is fixed with v3.2.
Wed, Apr 8, 2:03 PM · Patch-For-Review, Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic
JAllemandou added a comment to T422030: Surge in webrequest validation check.

Yes, sequence numbers are enerated by haproxy itself, even if it results in a SSL handshake error where the sequence number doesn't reach haproxykafka when using haproxy 3.0 because the log format is ignored for that kind of error.

With 3.2, the log format is honored and haproxykafa is seeing all these sequence numbers

Wed, Apr 8, 8:47 AM · Patch-For-Review, Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic
JAllemandou added a comment to T422030: Surge in webrequest validation check.

Thanks for confirming the invalid-events change @Vgutierrez.
There still is something I don't understand:

  • The pattern we see in v3.0 seems to show that haproxykafka generates a sequence-id even when it discards a log as invalid, as we see 2x the number of invalid seqId per BADREQ.
  • With v3,2 we see a big increase in BADREQ that don't correlate with the number of invalid-seqenceId we were reporting previsouly: there are way more.
Wed, Apr 8, 8:17 AM · Patch-For-Review, Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic

Tue, Apr 7

JAllemandou added a comment to T422030: Surge in webrequest validation check.

An interesting change in behavior from 3.0 to 3.2 and that could be related is that after the upgrade, haproxykafka number of invalid messages dropped to 0: https://grafana.wikimedia.org/goto/afi95ztecv20wd?orgId=1

Tue, Apr 7, 8:26 AM · Patch-For-Review, Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic
JAllemandou updated subscribers of T413362: Move Mostcategories computation to Hadoop.

Ping @Ahoelzl on this. There are patches to review that the team doesn't know about.

Tue, Apr 7, 8:15 AM · Patch-For-Review, Data-Engineering-Icebox, Data-Engineering, DBA
JAllemandou added a comment to T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline.

Plan looks good to me :)

Tue, Apr 7, 8:14 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Thu, Apr 2

JAllemandou renamed T422030: Surge in webrequest validation check from Surge in webrequest sequence-id validation check to Surge in webrequest validation check.
Thu, Apr 2, 6:42 AM · Patch-For-Review, Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic
JAllemandou updated the task description for T422030: Surge in webrequest validation check.
Thu, Apr 2, 6:42 AM · Patch-For-Review, Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic

Wed, Apr 1

JAllemandou updated subscribers of T420974: when analyzing a Wikifunctions dump, parent_id in page creation revisions is sometimes 0 and sometimes None.

That's interesting!
@Ottomata could you have a look at the event side of thing? This could mean a bug, right?

Wed, Apr 1, 4:07 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Dumps-Generation
JAllemandou updated the task description for T422030: Surge in webrequest validation check.
Wed, Apr 1, 2:24 PM · Patch-For-Review, Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic
JAllemandou updated the task description for T422030: Surge in webrequest validation check.
Wed, Apr 1, 2:24 PM · Patch-For-Review, Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic
JAllemandou updated the task description for T422033: Investigate raise in Invalid HAProxyKafka messages in esams.
Wed, Apr 1, 2:23 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic
JAllemandou created T422033: Investigate raise in Invalid HAProxyKafka messages in esams.
Wed, Apr 1, 2:16 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic
JAllemandou created T422030: Surge in webrequest validation check.
Wed, Apr 1, 2:05 PM · Patch-For-Review, Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic
JAllemandou added a comment to T418804: table_maintenance_iceberg_monthly permission issue fails task due to permission on Ivy cache artifact.

I have experienced again the same issue today:

Exception in thread "main" java.io.FileNotFoundException: /tmp/table_maintenance_iceberg_monthly/ivy_spark3/cache/resolved-org.apache.spark-spark-submit-parent-73ae20fa-2b58-4c79-9568-c95b98695cd1-1.0.xml (Permission denied)

I'll ask Ben to do the cleanup.

Wed, Apr 1, 8:32 AM · Data-Engineering

Tue, Mar 31

JAllemandou created T421952: Make `analytics-admin` users able to impersonate any `analytics-*` user.
Tue, Mar 31, 6:20 PM · Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou added a comment to T297231: [Data Quality] Sending Apache Spark metrics to PushGateway.

I wish to revive this task, maybe not sending data to push-gateway at first, but at least storing metrics in ways that allow the DE team to access them. Inspecting how spark behaves internally will key for our migration from Hadoop to k8s.

Tue, Mar 31, 6:12 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Observability-Metrics
JAllemandou added a comment to T419436: Investigate Gobblin failures.

2 failed instance this morning:

  • 2x connection problem to metawiki API.
Tue, Mar 31, 7:58 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou moved T332215: Airflow skein hook shouldn't fail when not managing to gather yarn logs from Next Up to Done on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Tue, Mar 31, 7:27 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), Data Pipelines
JAllemandou claimed T332215: Airflow skein hook shouldn't fail when not managing to gather yarn logs.
Tue, Mar 31, 7:27 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), Data Pipelines
JAllemandou added a comment to T332215: Airflow skein hook shouldn't fail when not managing to gather yarn logs.

A PR has been created and merged for this: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/2140
It should have belonged to this task.

Tue, Mar 31, 7:26 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), Data Pipelines
JAllemandou added a comment to T332215: Airflow skein hook shouldn't fail when not managing to gather yarn logs.

I think I found the culprit for this.
In the stack trace, the error happens at this line:
https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/wmf_airflow_common/hooks/skein.py?ref_type=heads#L195
and if track the stack-trace, this lines shows up:
https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/wmf_airflow_common/hooks/skein.py?ref_type=heads#L272

Tue, Mar 31, 7:21 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), Data Pipelines

Mon, Mar 30

JAllemandou added a comment to T419436: Investigate Gobblin failures.

This is a client side timeout, yes? I wonder what our client timeout is...

Mon, Mar 30, 6:01 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou added a comment to T419162: Task Tries and Logs for Airflow DAGs sometimes unavailable.

I think I found the culprit for this.
In the stack trace, the error happens at this line:
https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/wmf_airflow_common/hooks/skein.py?ref_type=heads#L195
and if track the stack-trace, this lines shows up:
https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/wmf_airflow_common/hooks/skein.py?ref_type=heads#L272

Mon, Mar 30, 3:36 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou added a comment to T419980: ICU 72 upgrade: `categorylinks` table swap.

@JAllemandou We will postpone the upgrade (i.e. the table hot swap). However, the new tables are already created, though they are not in use. Is that a problem for you?

I don't see how just having the new tables created (and even loaded for the sake of it) could be an issue. Thanks for warning us :)

Mon, Mar 30, 10:33 AM · Data-Engineering-Radar, DBA, Data-Persistence, Data-Engineering, Schema-change, User-Raine, ServiceOps new, ServiceOps-Upgrades-Hardware, ServiceOps-Mediawiki
JAllemandou added a comment to T419436: Investigate Gobblin failures.

More failures from March 27th to March 30th:

Mon, Mar 30, 9:34 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou added a comment to T416123: Define the turnilo global config.

@brouberol : Should we consider this done?

Mon, Mar 30, 8:04 AM · Data-Platform-SRE (2026-03-27 - 2026-04-17)

Fri, Mar 27

JAllemandou added a comment to T419436: Investigate Gobblin failures.

We have experienced failures in the past few days days (March 24, 25, 26, 27).
Here's a summary of the detailed failure:
False errors:

  • Airflow log retrieval failure. The underlying task was successful.

Real errors:

  • 8 times connection problem to metawiki API:
FailureRequest to uri https://meta.wikimedia.org/w/api.php?format=json&action=streamconfigs&all_settings=true failed. BasicHttpResult(failure)  encountered local exception: Connect to mw-api-int-ro.discovery.wmnet:4446 [mw-api-int-ro.discovery.wmnet/10.2.2.81] failed: Connection timed out (Connection timed out)
  • Error with file availablity in task pod. Possibly related to git sync synchronicity.
skein.exceptions.DriverError: Failed to submit application, exception:
File file:/opt/airflow/dags/.worktrees/9a4d46e7e7fd1cf2774e8a92eec5724f00e250ab/main/dags/gobblin/config/analytics-common.properties does not exist
  • Error with Yarn launching skein app.
org.apache.hadoop.yarn.exceptions.InvalidApplicationMasterRequestException: Application doesn't exist in cache appattempt_1773779850057_1912_000001
Fri, Mar 27, 10:25 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou renamed T419436: Investigate Gobblin failures from Investigate Gobblin failures between 2026-02-16 and 2026-03-07 to Investigate Gobblin failures.
Fri, Mar 27, 9:52 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou moved T419436: Investigate Gobblin failures from Next Up to In progress on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Fri, Mar 27, 9:51 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou updated subscribers of T420787: Visualizing inconsistencies and reconciles via Superset.

This is awesome work, it will really help building trust in the dataset. Kudos @xcollazo and @APizzata-WMF :)

Fri, Mar 27, 7:19 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Thu, Mar 26

JAllemandou added a comment to T409462: mediawiki.page_change.v1 event - add a page type field.

This is indeed very relevant @Ottomata . If we could have this info in the event it'd be very useful.

Thu, Mar 26, 4:13 PM · Data-Engineering, Event-Platform

Wed, Mar 25

JAllemandou added a comment to T398236: Manage druid `webrequest_sampled_live` data size.

Current status: The 5 hosts are full at ~75%, with almost 2Tb used from 2.75Tb each. This represents ~10Tb used. From those 10Tb, webrequest_sampled_live account for ~4Tb (2Tb useful replicated 2 times), and wmf_netflow for 3.4Tb (1.7Tb useful replicated 2 times).
For the moment the cluster holds, but we need to be careful if we wish to continue to grow the datasets.

Wed, Mar 25, 8:16 PM · Data-Engineering

Tue, Mar 24

JAllemandou added a comment to T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline.

Thank you folks for considering my idea :)

Tue, Mar 24, 10:30 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Mon, Mar 23

JAllemandou added a comment to T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline.

What I like with the valid_until field is the possibility to keep old inactive records, in case we'd have to re-use them for instance, or to remember the state of the filtering in the past. If you still don't want it, please do as you see fit :)

Mon, Mar 23, 7:22 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
JAllemandou added a comment to T411116: CentralAuth's localuser table contains many nulls and duplicate mappings.

Monday I will delete the file 17 from each snapshot and run msck repair table

Mon, Mar 23, 10:27 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), MediaWiki-Platform-Team, MediaWiki-extensions-CentralAuth
JAllemandou added a comment to T419980: ICU 72 upgrade: `categorylinks` table swap.

The big database import (sqoop) into the Data Lake starts on the first of each month at 05:00.

The sqoop jobs fleet starts at midnight UTC on the first of the month, and usually lasts 2 days and a half if everything goes well.
Thank you for considering scheduling your operation at a different time :)

Mon, Mar 23, 8:06 AM · Data-Engineering-Radar, DBA, Data-Persistence, Data-Engineering, Schema-change, User-Raine, ServiceOps new, ServiceOps-Upgrades-Hardware, ServiceOps-Mediawiki

Fri, Mar 20

JAllemandou added a comment to T420434: Analyze SQL queries generating metrics.

Hi @JAllemandou , the metrics have been developed in accordance to the new Contributor measurement strategy. you can see definitions in the following links

Fri, Mar 20, 8:04 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)

Thu, Mar 19

JAllemandou added a comment to T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline.

If we went with a Hive table for the bot JA3N-JA4H list, would you prefer it being located in another (non-Iceberg) database, like wmf?

I'd prefer, but it's not very important.
I'm more in wonder about making it Iceberg versus not. I see this table as possibly be updated with some regularity, if we automate finding bad actors for instance, or something similar. Also, @GGoncalves-WMF was mentioning the will to combine the various actor-related table into a single one at some point.
I don't know how much we wish to make the list for the backfill "future-proof" versus one-off.

Thu, Mar 19, 2:20 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
JAllemandou added a comment to T411116: CentralAuth's localuser table contains many nulls and duplicate mappings.

The file I am talking about is only made up of duplicates, therefore deleting it would just remove duplication

Thu, Mar 19, 2:11 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), MediaWiki-Platform-Team, MediaWiki-extensions-CentralAuth
JAllemandou added a comment to T420434: Analyze SQL queries generating metrics.

After some time reading and processing the queries used to generate the metrics asked weekly, here are some findings:

Thu, Mar 19, 2:08 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou added a comment to T411116: CentralAuth's localuser table contains many nulls and duplicate mappings.

Regarding the old data? Since we know the duplicates are always in the same file we could think of dropping these problematic files. What do you think?

Hm, this would mean partially incomplete data. I'd rather have duplicate in my data than incomplete one.
We should nonetheless communicate about this!
I reviewed mediawiki_history code, and my analysis says that we introduced some duplication :(
Given metrics didn't crash, it means the numbers are not huge, but it's not great anyhow.

Thu, Mar 19, 1:53 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), MediaWiki-Platform-Team, MediaWiki-extensions-CentralAuth
JAllemandou updated the task description for T420434: Analyze SQL queries generating metrics.
Thu, Mar 19, 1:35 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou added a comment to T411116: CentralAuth's localuser table contains many nulls and duplicate mappings.

I think it's easier to make it happen this way (reducing the mapper weight in sqoop script) than changing puppet. If ok for everyone, let's make it happen (with a comment in the code :) )

Thu, Mar 19, 10:10 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), MediaWiki-Platform-Team, MediaWiki-extensions-CentralAuth

Wed, Mar 18

JAllemandou added a comment to T411116: CentralAuth's localuser table contains many nulls and duplicate mappings.

Yet another idea is to use Spark to get this data. But this is quite a change.

Wed, Mar 18, 4:38 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), MediaWiki-Platform-Team, MediaWiki-extensions-CentralAuth
JAllemandou added a comment to T411116: CentralAuth's localuser table contains many nulls and duplicate mappings.

What if we spilt by another column like lu_local_id?

Wed, Mar 18, 4:12 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), MediaWiki-Platform-Team, MediaWiki-extensions-CentralAuth
JAllemandou added a comment to T411116: CentralAuth's localuser table contains many nulls and duplicate mappings.

Now that I've made myself a fool by not being precise enough, let's get back to solutions :)
I can see tow ways:

  • the one I suggested above
  • reducing the mapper weight for that table. If we go from 0.5 to 0.25, the effect will be to reduce the number o mappers by two, exactly the same as changing it in puppet.
Wed, Mar 18, 4:06 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), MediaWiki-Platform-Team, MediaWiki-extensions-CentralAuth
JAllemandou added a comment to T411116: CentralAuth's localuser table contains many nulls and duplicate mappings.

I've read our code and the bug report again, there is something I don't understand: the bug is supposed to happen when splitting a table on String type field, but we split on a Long type field:
https://github.com/wikimedia/analytics-refinery/blob/master/python/refinery/sqoop.py#L1320
I'd really like for us to investigate more. let's sync @APizzata-WMF .

Wed, Mar 18, 3:26 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), MediaWiki-Platform-Team, MediaWiki-extensions-CentralAuth
JAllemandou added a comment to T411116: CentralAuth's localuser table contains many nulls and duplicate mappings.

My way of dealing with that would be to change the puppet code to using 32 mappers.
This will involve creating anew variable and update the template, not great, but at least we'll have a solution.
And obviously, in addition to the code change, add a comment referencing your previous comment to explain why we do that.

Wed, Mar 18, 3:21 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), MediaWiki-Platform-Team, MediaWiki-extensions-CentralAuth
JAllemandou added a comment to T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline.

I think 2 (expiration) is a good place to start and seems less likely to introduce issues in the long term (i.e. I'd rather have a few more incidents than realize we've had bad data for 2 years because of an old rule).

Wed, Mar 18, 11:30 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
JAllemandou moved T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel from In Progress to Blocked/Waiting on the Data-Platform-SRE (2026-03-06 - 2026-03-27) board.
Wed, Mar 18, 8:59 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
JAllemandou moved T420434: Analyze SQL queries generating metrics from Backlog - project to In Progress on the Data-Platform-SRE (2026-03-06 - 2026-03-27) board.
Wed, Mar 18, 8:59 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou moved T419436: Investigate Gobblin failures from In Progress to Backlog - project on the Data-Platform-SRE (2026-03-06 - 2026-03-27) board.
Wed, Mar 18, 8:59 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou moved T412925: Carry out end-user testing of spark on kubernetes from In progress to Blocked/Paused on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Wed, Mar 18, 8:59 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17), Essential-Work
JAllemandou moved T420434: Analyze SQL queries generating metrics from Next Up to In progress on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Wed, Mar 18, 8:58 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou triaged T420434: Analyze SQL queries generating metrics as High priority.
Wed, Mar 18, 8:58 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou created T420434: Analyze SQL queries generating metrics.
Wed, Mar 18, 8:57 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou added a comment to T418032: Weekly delivery cadence of core contributor metrics.

@Mayakp.wiki , can you please confirm that the SQL code for the metrics defined above is this one please? Thank you :)

Wed, Mar 18, 8:06 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
JAllemandou added a comment to T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline.

I have opinions on this indeed :)

Wed, Mar 18, 7:36 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Tue, Mar 17

JAllemandou placed T416121: Define the turnilo helmfiles up for grabs.
Tue, Mar 17, 3:55 PM · Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou placed T416119: Define the turnilo kubeconfigs up for grabs.
Tue, Mar 17, 3:55 PM · Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou placed T416120: Define the turnilo namespaces up for grabs.
Tue, Mar 17, 3:55 PM · Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou moved T419436: Investigate Gobblin failures from Backlog - project to In Progress on the Data-Platform-SRE (2026-03-06 - 2026-03-27) board.
Tue, Mar 17, 3:54 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)

Fri, Mar 13

JAllemandou added a comment to T418606: Include session type in x-analytics header.

I confirm I have data in the datalake for auth_type. However, the numbers for api.wikimedia.org are very low for March 12:

select
    x_analytics_map['auth_type'],
    count(1) as c
from wmf.webrequest
where webrequest_source = 'text'
  AND year = 2026 and month = 3 and day = 12
  AND uri_host = 'api.wikimedia.org'
GROUP BY
  x_analytics_map['auth_type']
ORDER BY c DESC
limit 50
Fri, Mar 13, 9:14 AM · MW-1.46-notes (1.46.0-wmf.19; 2026-03-10), MediaWiki-Platform-Team (Q3 Kanban Board), MediaWiki-Core-AuthManager

Thu, Mar 12

JAllemandou moved T419540: Make canary-events use a single airflow task per dg-run instead of one per stream from In progress to Done on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Thu, Mar 12, 5:37 PM · Patch-For-Review, Data-Platform-SRE (2026-03-06 - 2026-03-27), Data-Engineering (Q3 FY25/26 January 1st - March 31th)
JAllemandou moved T419540: Make canary-events use a single airflow task per dg-run instead of one per stream from In Progress to Done on the Data-Platform-SRE (2026-03-06 - 2026-03-27) board.
Thu, Mar 12, 5:36 PM · Patch-For-Review, Data-Platform-SRE (2026-03-06 - 2026-03-27), Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Mar 11 2026

JAllemandou moved T419540: Make canary-events use a single airflow task per dg-run instead of one per stream from Next Up to In progress on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mar 11 2026, 9:54 AM · Patch-For-Review, Data-Platform-SRE (2026-03-06 - 2026-03-27), Data-Engineering (Q3 FY25/26 January 1st - March 31th)
JAllemandou added a comment to T419540: Make canary-events use a single airflow task per dg-run instead of one per stream.

My assumption was that we would rerun failed airflow task when failure happens, generating more-than-needed canary events.

Mar 11 2026, 8:44 AM · Patch-For-Review, Data-Platform-SRE (2026-03-06 - 2026-03-27), Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Mar 10 2026

JAllemandou claimed T419540: Make canary-events use a single airflow task per dg-run instead of one per stream.
Mar 10 2026, 2:09 PM · Patch-For-Review, Data-Platform-SRE (2026-03-06 - 2026-03-27), Data-Engineering (Q3 FY25/26 January 1st - March 31th)
JAllemandou created T419540: Make canary-events use a single airflow task per dg-run instead of one per stream.
Mar 10 2026, 2:09 PM · Patch-For-Review, Data-Platform-SRE (2026-03-06 - 2026-03-27), Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Mar 9 2026

JAllemandou claimed T419436: Investigate Gobblin failures.
Mar 9 2026, 3:27 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou created T419436: Investigate Gobblin failures.
Mar 9 2026, 3:27 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)
JAllemandou moved T416113: Deploy turnilo to dse-k8s-eqiad from In progress to Blocked/Paused on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mar 9 2026, 2:41 PM · Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q3 FY25/26 January 1st - March 31th), Patch-For-Review
JAllemandou moved T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel from Next Up to In progress on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mar 9 2026, 2:41 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
JAllemandou edited projects for T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel, added: Data-Engineering (Q3 FY25/26 January 1st - March 31th); removed Data-Engineering.
Mar 9 2026, 2:41 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
JAllemandou moved T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel from Backlog - project to In Progress on the Data-Platform-SRE (2026-03-06 - 2026-03-27) board.
Mar 9 2026, 8:41 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
JAllemandou claimed T348763: Make eventstreams-internal available to WMF staff without an ssh tunnel.
Mar 9 2026, 8:40 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17), ServiceOps-Services-Oids, ServiceOps new, Data-Engineering-Radar, Event-Platform
JAllemandou added a comment to T418466: Create a data product of IP range to owner/provenance label.

@KCVelaga_WMF let's talk about how to categorize IPs. I think creating a table with ranges is really sub-optimal and we could do better, even for a temporary solution.

Mar 9 2026, 8:07 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
JAllemandou closed T418455: haproxy: capture `x_trusted_request` in webrequest data set, a subtask of T417778: rest gateway: enforce rate limits (stage one), as Resolved.
Mar 9 2026, 7:45 AM · MediaWiki-Platform-Team (Radar), OKR-Work, MW-Interfaces-Team
JAllemandou closed T418455: haproxy: capture `x_trusted_request` in webrequest data set as Resolved.

I validated the data this morning.
This is done.

Mar 9 2026, 7:45 AM · MediaWiki-Platform-Team (Radar), OKR-Work, MW-Interfaces-Team

Mar 5 2026

JAllemandou added a comment to T419050: Optimize enqueueing of refine_webrequest_hourly pipeline.

I see how the change defined above has an impact on SLAs: for an SLA defined of 5h, if we're waiting one hour more than before to get source data, we alert sooner than before in relation to the source data.
I don't see how this would affect sensor timeouts, as our default values are very long (some might be infinite?). It might not be the same in different airflow instances.

Mar 5 2026, 4:57 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)

Mar 4 2026

JAllemandou moved T417864: haproxy: capture x-wmf-* headers in webrequest data set from Next Up to In progress on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mar 4 2026, 10:17 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), Traffic, MediaWiki-Platform-Team (Radar), OKR-Work, MW-Interfaces-Team
JAllemandou claimed T417864: haproxy: capture x-wmf-* headers in webrequest data set.
Mar 4 2026, 10:16 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), Traffic, MediaWiki-Platform-Team (Radar), OKR-Work, MW-Interfaces-Team
JAllemandou closed T418551: HdfsTotalFilesHeap warning as Resolved.
Mar 4 2026, 10:13 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), Data-Platform-SRE (2026-02-13 - 2026-03-06)
JAllemandou closed T418152: Reduce noise from HdfsRpcQueueLength alert as Resolved.
Mar 4 2026, 10:13 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), Data-Platform-SRE (2026-02-13 - 2026-03-06), Data-Engineering-Radar
JAllemandou moved T418551: HdfsTotalFilesHeap warning from To Be Deployed to Done on the Data-Platform-SRE (2026-02-13 - 2026-03-06) board.
Mar 4 2026, 10:12 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), Data-Platform-SRE (2026-02-13 - 2026-03-06)
JAllemandou moved T418152: Reduce noise from HdfsRpcQueueLength alert from Needs Review to Done on the Data-Platform-SRE (2026-02-13 - 2026-03-06) board.
Mar 4 2026, 10:12 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), Data-Platform-SRE (2026-02-13 - 2026-03-06), Data-Engineering-Radar

Mar 3 2026

JAllemandou placed T418466: Create a data product of IP range to owner/provenance label up for grabs.
Mar 3 2026, 9:02 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)