Page MenuHomePhabricator

mforns (Marcel Ruiz Forns)
Software Engineer @ Analytics

Today

  • No visible events.

Tomorrow

  • No visible events.

Tuesday

  • No visible events.

User Details

User Since
Nov 7 2014, 8:52 PM (596 w, 1 d)
Availability
Available
IRC Nick
mforns
LDAP User
Mforns
MediaWiki User
Mforns (WMF) [ Global Accounts ]

Recent Activity

Wed, Apr 8

mforns added a comment to T417694: Perform a one-time clean up of retained data sets in event_sanitize.

I'd be totally in favor of setting a long-time retention period for event_sanitized.
The overall plan looks great to me!

Wed, Apr 8, 6:12 PM · Essential-Work, Data-Engineering (Q4 FS25/26 April 1st - June 30st)
mforns updated the task description for T421735: Backfill datasets affected by Nov 2025 automated traffic incident.
Wed, Apr 8, 1:29 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Tue, Apr 7

mforns moved T348963: DagProperties don't automatically update Airflow variables from In progress to Blocked/Paused on the Data-Engineering (Q4 FS25/26 April 1st - June 30st) board.
Tue, Apr 7, 3:59 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Patch-For-Review, Structured-Data-Backlog
mforns moved T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline from In Review to Done on the Data-Engineering (Q4 FS25/26 April 1st - June 30st) board.
Tue, Apr 7, 3:58 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
mforns added a comment to T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline.

The deployment finished successfully.
The pageview and unique devices data will be using the new JA3N-JA4H list starting on 2026-04-07T11:00:00.

Tue, Apr 7, 3:58 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Fri, Apr 3

mforns added a comment to T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline.

I prepared a deployment plan for the automated traffic detection changes:

  1. Get an approval for airflow-dag changes.
  2. Pause webrequest_actor_metrics_hourly DAG in Airflow UI.
  3. Wait until webrequest_actor jobs have finished.
  4. Merge airflow-dags changes.
  5. Make sure the Blunderbuss pipeline runs.
  6. ALTER TABLE wmf.webrequest_actor_metrics_hourly ADD COLUMNS (bot_ja3n_ja4h_pageview_count bigint COMMENT 'Number of pageviews with suspicious JA3N+JA4H pairs');
  7. ALTER TABLE wmf.webrequest_actor_metrics_rollup_hourly ADD COLUMNS (bot_ja3n_ja4h_pageview_share double COMMENT 'Percentage of pageviews with suspicious JA3N+JA4H pairs');
  8. Unpause webrequest_actor_metrics_hourly DAG in Airflow UI.
  9. Quickly vet the data generated by the production DAGs.

I don't think it's a good idea to deploy today Friday before a long weekend, and most of the team on holiday. So, I'll do it when we come back next week.

Fri, Apr 3, 7:23 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
mforns added a comment to T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline.

I've finished the tests.
Hamid and I have checked that both actor counts and pageview counts for different dates within the incident match Hamid's analysis.
Given this, and the 2 approvals for the code, I just merged https://gitlab.wikimedia.org/repos/data-engineering/refinery-private/-/merge_requests/2

Fri, Apr 3, 2:44 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Thu, Apr 2

mforns updated the task description for T421735: Backfill datasets affected by Nov 2025 automated traffic incident.
Thu, Apr 2, 3:09 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
mforns updated subscribers of T421735: Backfill datasets affected by Nov 2025 automated traffic incident.
Thu, Apr 2, 2:26 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Mon, Mar 30

mforns moved T421735: Backfill datasets affected by Nov 2025 automated traffic incident from Next Up to In progress on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mon, Mar 30, 3:13 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
mforns added a comment to T421735: Backfill datasets affected by Nov 2025 automated traffic incident.

Here's the backfilling plan document.

Mon, Mar 30, 2:56 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
mforns created T421735: Backfill datasets affected by Nov 2025 automated traffic incident.
Mon, Mar 30, 2:55 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Thu, Mar 26

mforns updated subscribers of T419267: The revision_seconds_to_identity_revert field in wmf.mediawiki_history has sometimes negative values.

@Ahoelzl @GGoncalves-WMF
@CMyrick-WMF reached out to us, interested in this task, since she will soon be working on time-to-revert insights, and this fix would benefit her work.

Thu, Mar 26, 5:17 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Wed, Mar 25

mforns added a comment to T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline.

I started the tests on Jan 14th.
I will regenerate all the pipeline from webrequest_actor_metrics_hourly to unique_devices_*_daily.
If there's more time I will test other dates (Dec 1st, Dec 26th, Feb 19th) but only until pageview_actor.

Wed, Mar 25, 9:01 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Mon, Mar 23

mforns added a comment to T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline.

@JAllemandou Ah! I understand now.
Like:

  • Initially, we set valid_until to NULL.
  • Then, the day we decide to turn off some of those JA3N/JA4H pairs, we manually set their valid_until to that date.

OK, I will do it like this.
🙏

Mon, Mar 23, 9:21 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
mforns added a comment to T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline.

but I don't think we should automatically disable the filtering when we reach the valid_until date.

So we keep it null for now and when audited that it should be stopped to be used we add the value?

How about the table schema being created_at timestamp, updated_at timestamp, ja3n string, ja4h string, reference string (phab task),
and at each audit, we manually add/delete/update the records that we decide?

Mon, Mar 23, 5:20 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Fri, Mar 20

mforns added a comment to T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline.

OK, will switch to an Iceberg table and put it in wmf_traffic! Thanks for the feedback.
The valid_until field makes sense as a reminder, but I don't think we should automatically disable the filtering when we reach the valid_until date.
Otherwise, we could revert to previous incident's traffic level without noticing.
In our last meeting with @GGoncalves-WMF we informally agreed that there would be a periodical audit of this list, and other automated traffic parameters, and we would reevaluate, remove and add as necessary.

Fri, Mar 20, 5:09 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Thu, Mar 19

mforns added a comment to T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline.

@APizzata-WMF Yes, you're right! Hm, maybe we don't even need to add it as a dataset in Airflow, since we can assume it's always going to exist and be properly populated? It's a static dataset, that doesn't see any regular updates, so I think we should not use Airflow sensors for it. So, I guess we can skip table maintenance altogether?

Thu, Mar 19, 11:41 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Wed, Mar 18

mforns added a comment to T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline.

Thanks for the feedback!

Wed, Mar 18, 5:20 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Tue, Mar 17

mforns moved T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline from Next Up to In progress on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Tue, Mar 17, 8:44 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
mforns updated subscribers of T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline.

One question related to creating new tables in the new Iceberg databases.
I want to create a new table that stores the JA3N-JA4H pairs that we'll mark as automated traffic in our bot detection pipeline.
It would be nice that we store them as TSV, and whenever we update them, a refinery deploy (or manual sync) is sufficient to update the contents of the table.
However, if we create that new table in Iceberg, IIUC we can not do that, we need a delete-select-insert from the file to the Iceberg table.
I would be OK with using an external Hive table for this, so it stays simple.
But, then, can we still put this table in the (in theory exclusively) Iceberg database wmf_traffic (which Antonio and I think is the proper location of the table)?
I think @JAllemandou has strong opinions on this.

Tue, Mar 17, 8:43 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
mforns created T420412: Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline.
Tue, Mar 17, 8:26 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Mon, Mar 16

mforns moved T351225: Productionized Edit Types from In progress to Ready to Deploy on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mon, Mar 16, 4:16 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Patch-For-Review, Research-Freezer, Event-Platform, Research-engineering
mforns moved T420069: Schedule three new monthly DBT models for Movement Insights from Next Up to In progress on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mon, Mar 16, 4:14 PM · OKR-Work (WE1 FY2025-26), Data-Engineering (Q3 FY25/26 January 1st - March 31th)
mforns moved T419925: Build a set of configurable pre-scheduled DBT Airflow DAGs executing dbt-jobs models from Next Up to In progress on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mon, Mar 16, 4:13 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
mforns moved T419594: Implement more fine-grained selection of DBT models in DbtSkeinOperator from Next Up to Done on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mon, Mar 16, 4:12 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
mforns moved T418190: Refactor pingback reports pipelines using dbt from Next Up to In progress on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mon, Mar 16, 4:12 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
mforns moved T416200: Attribution Research First Experiment from In progress to Ready to Deploy on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mon, Mar 16, 4:03 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
mforns renamed T408918: Upgrade mediawiki-event-enrichment jobs to >= Flink 1.20.3 and Java 17 from Upgrade mediawiki-event-enrichment jobs to >= Flink 1.20.2 and Java 17 to Upgrade mediawiki-event-enrichment jobs to >= Flink 1.20.3 and Java 17.
Mon, Mar 16, 3:57 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), Patch-For-Review, Event-Platform, Essential-Work
mforns placed T416113: Deploy turnilo to dse-k8s-eqiad up for grabs.
Mon, Mar 16, 3:54 PM · Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q3 FY25/26 January 1st - March 31th), Patch-For-Review
mforns moved T415202: Introduce a new AQS endpoint to expose video plays from In progress to In Review on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mon, Mar 16, 3:51 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), AQS2.0
mforns moved T420046: Add Human-Bot Alert Runbook Link to Alert Email. from In progress to Done on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mon, Mar 16, 3:50 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
mforns moved T405379: Clean up artifacts.yaml from Ready to Deploy to Done on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mon, Mar 16, 3:50 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
mforns moved T416470: Dan and Thomas can deploy backports from In progress to Done on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mon, Mar 16, 3:48 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Mar 11 2026

mforns moved T348963: DagProperties don't automatically update Airflow variables from Next Up to In progress on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mar 11 2026, 9:18 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Patch-For-Review, Structured-Data-Backlog
mforns claimed T348963: DagProperties don't automatically update Airflow variables.
Mar 11 2026, 9:17 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Patch-For-Review, Structured-Data-Backlog
mforns edited projects for T348963: DagProperties don't automatically update Airflow variables, added: Data-Engineering (Q3 FY25/26 January 1st - March 31th); removed Data-Engineering.
Mar 11 2026, 9:17 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Patch-For-Review, Structured-Data-Backlog

Mar 6 2026

mforns added a comment to T419267: The revision_seconds_to_identity_revert field in wmf.mediawiki_history has sometimes negative values.

When troubleshooting and fixing this, we should consider also solving T266374,
since diving in mediawiki_history code always takes some time and effort.
If we solve the 2 issues in one go, we can save important context switching time.

Mar 6 2026, 7:54 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
mforns created T419267: The revision_seconds_to_identity_revert field in wmf.mediawiki_history has sometimes negative values.
Mar 6 2026, 5:39 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Mar 4 2026

mforns added a comment to T419050: Optimize enqueueing of refine_webrequest_hourly pipeline.

In theory, the default sensor timeout is 7 days.
I haven't found anywhere in the code where we override this value.
Do you know why our sensors timeout so early?

Mar 4 2026, 7:13 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Platform-SRE (2026-03-27 - 2026-04-17)

Mar 2 2026

mforns added a comment to T418754: Do multiple code and data clean ups for content tables.

LGTM @xcollazo

Mar 2 2026, 7:07 PM · Patch-For-Review, Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Dec 1 2025

mforns moved T409584: Productionize JA3N-UA table to improve bot detection from In progress to Done on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Dec 1 2025, 4:33 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Data-Engineering-Roadmap, OKR-Work, Epic

Nov 27 2025

mforns moved T409584: Productionize JA3N-UA table to improve bot detection from Next Up to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Nov 27 2025, 6:35 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Data-Engineering-Roadmap, OKR-Work, Epic
mforns added a project to T409584: Productionize JA3N-UA table to improve bot detection: Data-Engineering (Q2 FY25/26 October 1st - December 31th).
Nov 27 2025, 6:35 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Data-Engineering-Roadmap, OKR-Work, Epic
mforns added a comment to T410962: Provision Global Editor Metrics tables & endpoints.

Thank you @Eevans!

Nov 27 2025, 10:39 AM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence

Nov 25 2025

mforns added a comment to T406069: Global Editor Metrics - Druid mediawiki_history_reduced changes.

Makes sense @JAllemandou!

Nov 25 2025, 10:32 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), OKR-Work, MediaWiki-Page-derived-data

Nov 21 2025

mforns added a comment to T410768: Update Commons Impact Metrics allow-list November 2025.

Merged and deployed the update, thanks!

Nov 21 2025, 7:11 PM · Data-Engineering, Commons-Impact-Metrics-Requests, Commons-Impact-Metrics

Nov 20 2025

mforns moved T408405: Design and populate temporary table for JA3N analysis from In progress to Done on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Nov 20 2025, 12:05 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns moved T409577: Analyze JA3N data and generate JA3N-UA table from Next Up to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Nov 20 2025, 12:05 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Engineering-Roadmap, OKR-Work, Epic
mforns added a project to T409577: Analyze JA3N data and generate JA3N-UA table: Data-Engineering (Q2 FY25/26 October 1st - December 31th).
Nov 20 2025, 12:05 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Engineering-Roadmap, OKR-Work, Epic
mforns reassigned T409577: Analyze JA3N data and generate JA3N-UA table from mforns to Hghani.
Nov 20 2025, 12:04 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Engineering-Roadmap, OKR-Work, Epic

Nov 18 2025

mforns created T410431: Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily.
Nov 18 2025, 5:17 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Nov 10 2025

mforns moved T405041: Global Editor Metrics - HTTP API endpoints from Next Up to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Nov 10 2025, 4:19 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), Patch-For-Review, OKR-Work, MediaWiki-Page-derived-data, Growth-Team, Wikipedia-Android-App-Backlog, Wikipedia-iOS-App-Backlog
mforns moved T406509: Commons Impact Metrics has no data for September snapshot from In Review to Done on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Nov 10 2025, 4:19 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Nov 7 2025

mforns created T409584: Productionize JA3N-UA table to improve bot detection.
Nov 7 2025, 6:57 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Data-Engineering-Roadmap, OKR-Work, Epic
mforns created T409577: Analyze JA3N data and generate JA3N-UA table.
Nov 7 2025, 6:10 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Data-Engineering-Roadmap, OKR-Work, Epic

Nov 3 2025

mforns moved T408404: Vet JA3N data in webrequest and pageview_actor from In progress to Done on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Nov 3 2025, 4:20 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns added a comment to T408404: Vet JA3N data in webrequest and pageview_actor.

Summary of JA3N data vetting:

Nov 3 2025, 4:20 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Oct 28 2025

mforns moved T408561: Update Commons Impact Metrics allow-list October 2025 from Next Up to Done on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 28 2025, 6:58 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Commons-Impact-Metrics-Requests, Commons-Impact-Metrics
mforns edited projects for T408561: Update Commons Impact Metrics allow-list October 2025, added: Data-Engineering (Q2 FY25/26 October 1st - December 31th); removed Data-Engineering.
Oct 28 2025, 6:58 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Commons-Impact-Metrics-Requests, Commons-Impact-Metrics

Oct 27 2025

mforns moved T408405: Design and populate temporary table for JA3N analysis from Next Up to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 27 2025, 3:19 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns created T408405: Design and populate temporary table for JA3N analysis.
Oct 27 2025, 3:19 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns moved T408404: Vet JA3N data in webrequest and pageview_actor from Next Up to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 27 2025, 3:17 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns created T408404: Vet JA3N data in webrequest and pageview_actor.
Oct 27 2025, 3:16 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Oct 23 2025

mforns updated the task description for T407893: Technical work for SDS1.3.7 Incorporate Edge Signal.
Oct 23 2025, 12:42 PM · Data-Engineering-Roadmap, OKR-Work, Epic
mforns updated the task description for T407893: Technical work for SDS1.3.7 Incorporate Edge Signal.
Oct 23 2025, 12:42 PM · Data-Engineering-Roadmap, OKR-Work, Epic
mforns renamed T407893: Technical work for SDS1.3.7 Incorporate Edge Signal from [Hypothesis] SDS1.3.7 Incorporate Edge Signal to Technical work for SDS1.3.7 Incorporate Edge Signal.
Oct 23 2025, 12:40 PM · Data-Engineering-Roadmap, OKR-Work, Epic

Oct 21 2025

mforns created T407893: Technical work for SDS1.3.7 Incorporate Edge Signal.
Oct 21 2025, 6:54 PM · Data-Engineering-Roadmap, OKR-Work, Epic
mforns moved T400380: MediaWiki\Revision\RevisionAccessException: Unable to load fresh row for rev_id: {rev_id} from In Review to Blocked/Paused on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 21 2025, 3:32 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), MW-1.46-notes (1.46.0-wmf.21; 2026-03-24), Patch-For-Review, MW-Interfaces-Team, Event-Platform, MediaWiki-DomainEvents, Unstewarded-production-error, MediaWiki-Core-Revision-backend, Wikimedia-production-error
mforns moved T405952: EventgateProduceRateStop / EventGateProduceRateAnomaly alert should be active datacenter aware from Next Up to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 21 2025, 3:08 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Observability-Alerting, Event-Platform
mforns moved T397330: mediawiki.content_history: flink applications experiencing frequent restarts due to JobManager OOMs from In progress to Done on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 21 2025, 3:08 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Event-Platform
mforns moved T401725: Deploy mediawiki-event-enrichment Flink jobs running 1.20 from In progress to Done on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 21 2025, 3:08 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Event-Platform, Dumps 2.0 (Kanban Board)

Oct 15 2025

mforns added a comment to T406509: Commons Impact Metrics has no data for September snapshot.

After some troubleshooting I saw that, when we added the linktarget table as a datasource for Commons Impact Metrics, we forgot to add the corresponding sensor.
This made it so that the September DAG run started before the linktarget data was properly loaded to the data lake, and so the CIM job produced empty results.
The MR above adds the proper sensor to the DAG.

Oct 15 2025, 9:27 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns moved T406509: Commons Impact Metrics has no data for September snapshot from Urgent to In Review on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 15 2025, 9:25 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Oct 13 2025

mforns moved T405667: Backfill datasets affected by automated traffic detection issues from In progress to Done on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 13 2025, 3:15 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns moved T402645: Prepare webrequest derived Airflow DAGs for large scale re-runs from In progress to Done on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 13 2025, 3:15 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review

Oct 8 2025

mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 8 2025, 8:13 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns added a comment to T405667: Backfill datasets affected by automated traffic detection issues.

OK, I think this time it worked.

  • We rolled back to the Iceberg snapshot previous to the corrupting of the data by the backfill process.
  • Then we copied April (still not backfilled) into a temp table in the wmf_staging database.
  • Then we rolled the tables forward to its present state
  • And finally deleted April data and copied the old temporary uncorrupted data into the April gap.
Oct 8 2025, 8:12 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns added a comment to T405667: Backfill datasets affected by automated traffic detection issues.

We executed the snapshot revert commands with success:

spark-sql (default)> CALL spark_catalog.system.rollback_to_snapshot('wmf_readership.unique_devices_per_domain_monthly', 2325163397903337906);
25/10/08 18:50:56 WARN BaseTransaction: Failed to load metadata for a committed snapshot, skipping clean-up
previous_snapshot_id	current_snapshot_id
1441360873772091690	2325163397903337906
Time taken: 2.376 seconds, Fetched 1 row(s)
Oct 8 2025, 6:53 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 8 2025, 1:43 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Oct 7 2025

mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 11:29 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 11:00 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 10:23 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 8:39 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 8:37 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 8:21 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 7:23 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 7:01 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 6:59 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 5:43 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 5:38 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 3:46 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 3:38 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 3:33 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 2:41 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 12:54 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 12:05 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 11:57 AM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 11:57 AM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
mforns updated the task description for T405667: Backfill datasets affected by automated traffic detection issues.
Oct 7 2025, 11:53 AM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)