Page MenuHomePhabricator

AKhatun_WMF (Aisha Khatun)
Data Engineer

Projects

User does not belong to any projects.

Today

  • No visible events.

Tomorrow

  • No visible events.

Tuesday

  • No visible events.

User Details

User Since
Apr 20 2021, 8:39 AM (267 w, 5 d)
Availability
Available
IRC Nick
akhatun
LDAP User
AKhatun
MediaWiki User
AKhatun (WMF) [ Global Accounts ]

Personal Accounts:

Check out my website/blog: http://tanny411.github.io/

Recent Activity

Thu, Jun 4

AKhatun_WMF added a comment to T427442: [Data Persistence Design Review] Editor count per page (Attribution API).

Ops, de-drafted. Thanks!

Thu, Jun 4, 4:36 PM · User-Eevans, Data-Persistence-Design-Review, Data-Persistence

Wed, Jun 3

AKhatun_WMF created T428018: Add user status details for cross-wiki users in MWH.
Wed, Jun 3, 4:07 AM · Data-Engineering
AKhatun_WMF updated subscribers of T426316: WE5.3.3b: Contributor Count Per Page [Attribution API].
  • Unless strictly required for editor metrics, I propose to leave out cross-wiki users for now. When we add user-status details in MWH (permanent/temp/anonymous + bot) for cross-wiki users, the editor metrics will be more accurate. {}
  • Continue to consider the latest revision of a editor in a page to get the most recent bot-ness. This means the bot-ness may not be updated in some pages until another edit is made. But the bot-ness is consistent within that page.
  • Consider bot-by-group and bot-by-name as bot count.
Wed, Jun 3, 3:49 AM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Tue, Jun 2

AKhatun_WMF added a comment to T427442: [Data Persistence Design Review] Editor count per page (Attribution API).

Thanks @Eevans ! Had a couple for discussions. Changed the field names a bit and added another field. This is what we want to move forward with

{
  "wiki_id": "enwiki",
  "page_id": 12345,
  "editor_total_count": 127,
  "editor_bot_count": 4,
  "editor_logged_out_count": 21,
  "editor_permanent_count": 93,
  "editor_temporary_count": 9,
  "page_is_deleted": false,
  "updated_at": "2026-05-26 00:00:00.000Z"
}

Thank you!

Tue, Jun 2, 4:23 PM · User-Eevans, Data-Persistence-Design-Review, Data-Persistence
AKhatun_WMF updated the task description for T427442: [Data Persistence Design Review] Editor count per page (Attribution API).
Tue, Jun 2, 4:22 PM · User-Eevans, Data-Persistence-Design-Review, Data-Persistence
AKhatun_WMF added a comment to T426316: WE5.3.3b: Contributor Count Per Page [Attribution API].

Details/Caveats of this dataset:

  • We are using event_user_text_historical, meaning user name changes are not being considered. Because event_user_text (the latest username) field is not available in Inc. MWH. We use user_ids, so this should not effect permanent/temp users. Only affects anonymous users. Anon users can't change names, so our metric should be just fine.
  • We are using event_user_is_bot_by_historical, as current bot status event_user_is_bot_by is not available in Inc. MWH. This means if someone was deemed bot later on (or vice versa), we won't know that until that user makes a new change in that page. We receive an update for that user, and change the bot field. We are able to take on the latest value of bot-ness (if available) because we do MAX_BY(editor_is_bot, event_timestamp).
  • page_is_deleted is NULL: There are cases where page_is_deleted can be NULL as set in MWH.
    • This field is absent from Incremental MWH. We may have to drop this field entirely.
Tue, Jun 2, 1:53 AM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Mon, Jun 1

AKhatun_WMF added a comment to T426316: WE5.3.3b: Contributor Count Per Page [Attribution API].

Some analysis on user status (anonymous, temporary, permanent, cross-wiki)

  • In this metric we focus on event_user_central_id to get unique editor counts. Ideally all permanent and temporary users should have this id. For Anonymous (aka IP) users, we use the event_user_text field
  • Edge case: Some older revisions have permanent/temporary users who don't have a event_user_central_id.
    • We should use user_id for these
  • If both ids are NULL, we can fall back to user_text. Usually that is the case for anonymous users.
  • Cross-wiki users are marked as anonymous. This may be misleading as the users are not really anonymous. These do not have is_permanent or is_temporary set. So we don't know what kind of account the original user had directly from these fields.
    • We could include a cross_wiki_edit_count to count them separately as required.
Mon, Jun 1, 9:10 PM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st)
AKhatun_WMF added a comment to T426316: WE5.3.3b: Contributor Count Per Page [Attribution API].

Update:

Mon, Jun 1, 3:39 PM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Sat, May 30

AKhatun_WMF created T427701: Requesting access to Cassandra staging for akhatun.
Sat, May 30, 5:07 AM · SRE, SRE-Access-Requests

Fri, May 29

AKhatun_WMF updated the task description for T427442: [Data Persistence Design Review] Editor count per page (Attribution API).
Fri, May 29, 7:41 PM · User-Eevans, Data-Persistence-Design-Review, Data-Persistence

Thu, May 28

AKhatun_WMF updated the task description for T427442: [Data Persistence Design Review] Editor count per page (Attribution API).
Thu, May 28, 6:35 PM · User-Eevans, Data-Persistence-Design-Review, Data-Persistence
AKhatun_WMF created T427548: Check Editor Counts.
Thu, May 28, 6:03 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
AKhatun_WMF renamed T427442: [Data Persistence Design Review] Editor count per page (Attribution API) from [Data Persistence Design Review] <project> to [Data Persistence Design Review] Editor count per page (Attribution API).
Thu, May 28, 5:07 AM · User-Eevans, Data-Persistence-Design-Review, Data-Persistence

Wed, May 27

AKhatun_WMF created T427442: [Data Persistence Design Review] Editor count per page (Attribution API).
Wed, May 27, 7:33 PM · User-Eevans, Data-Persistence-Design-Review, Data-Persistence
AKhatun_WMF added a comment to T426316: WE5.3.3b: Contributor Count Per Page [Attribution API].

Full monthly load stats for all wikis, 2026-03 snapshot.

Wed, May 27, 5:52 AM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st)
AKhatun_WMF updated the task description for T426316: WE5.3.3b: Contributor Count Per Page [Attribution API].
Wed, May 27, 5:27 AM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st)
AKhatun_WMF updated the task description for T426316: WE5.3.3b: Contributor Count Per Page [Attribution API].
Wed, May 27, 5:16 AM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st)
AKhatun_WMF updated the task description for T426316: WE5.3.3b: Contributor Count Per Page [Attribution API].
Wed, May 27, 5:08 AM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Tue, May 26

AKhatun_WMF updated the task description for T423920: Streaming HTML & Edit Types - productionization checklist.
Tue, May 26, 5:27 PM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF updated the task description for T426316: WE5.3.3b: Contributor Count Per Page [Attribution API].
Tue, May 26, 4:58 PM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st)
AKhatun_WMF moved T426316: WE5.3.3b: Contributor Count Per Page [Attribution API] from Next Up to In progress on the Data-Engineering (Q4 FS25/26 April 1st - June 30st) board.
Tue, May 26, 3:45 PM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Wed, May 20

AKhatun_WMF added a comment to T423920: Streaming HTML & Edit Types - productionization checklist.

Ah, that makes sense! We don't need to get data into sanitized right now. Just wanted to inform. But looks like we are good. Thanks!

Wed, May 20, 7:46 PM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF moved T351225: Productionized Edit Types from Ready to Deploy to Done on the Data-Engineering (Q4 FS25/26 April 1st - June 30st) board.
Wed, May 20, 7:37 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Patch-For-Review, Research-Freezer, Event-Platform, Research-engineering
AKhatun_WMF moved T424364: Delete old edit-type stream from Next Up to Done on the Data-Engineering (Q4 FS25/26 April 1st - June 30st) board.
Wed, May 20, 7:32 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF updated the task description for T424364: Delete old edit-type stream.
Wed, May 20, 7:31 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF updated the task description for T426870: Move away from redacteddb for sqoop jobs.
Wed, May 20, 3:15 PM · Data-Engineering
AKhatun_WMF created T426870: Move away from redacteddb for sqoop jobs.
Wed, May 20, 3:14 PM · Data-Engineering
AKhatun_WMF updated the task description for T424364: Delete old edit-type stream.
Wed, May 20, 2:43 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF updated the task description for T423920: Streaming HTML & Edit Types - productionization checklist.
Wed, May 20, 2:40 PM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF added a comment to T423920: Streaming HTML & Edit Types - productionization checklist.

Wanted to note here:

  • The html counts dataset has data from 2026-05-17 14 since the change was merged/deployed on ~18th May.
  • In event sanitized. event has from 2026-05-01 00.
Wed, May 20, 2:40 PM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF added a comment to T425573: mediawiki_history_incremental_v1: schema specification for stakeholder review.

Following up on my comment T425573#11913740 and adding to @nshahquinn-wmf: It would help to add the MWH fields in snapshot rows even if they don't exist in event rows. For contributor counts as well, it is agreeable to have some fields populated monthly and reconcile with current data on our side as seen fit, but at least having the fields in the same table is helpful for that.

Wed, May 20, 2:04 PM · DPE-MediaWiki-Incremental-History, Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Fri, May 15

AKhatun_WMF added a comment to T422030: Surge in webrequest validation check.

@xcollazo Yes, we still have very frequent warnings.
https://airflow.wikimedia.org/dags/refine_webrequest_hourly_text/grid?search=refine_webrequest_hourly_text
Almost all the warning emails were sent here.

Fri, May 15, 2:58 PM · Patch-For-Review, Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic

Thu, May 14

AKhatun_WMF added a comment to T423920: Streaming HTML & Edit Types - productionization checklist.

We should also get rid of the hive tables for dev and rc0 versions. Can we just drop tables? Do we also need to cleanup the hdfs files?

Thu, May 14, 4:22 PM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF added a comment to T423920: Streaming HTML & Edit Types - productionization checklist.

Want to add

eqiad.mw_page_edit_type_enrich.error
Thu, May 14, 4:19 PM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF created T426316: WE5.3.3b: Contributor Count Per Page [Attribution API].
Thu, May 14, 12:59 PM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Tue, May 12

AKhatun_WMF removed a project from T424547: Edit type enrichment: Add timeout: Data-Engineering (Q4 FS25/26 April 1st - June 30st).
Tue, May 12, 11:29 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF added a comment to T425573: mediawiki_history_incremental_v1: schema specification for stakeholder review.

We have another use-case that wants to use MWH (hopefully). Contributors Count

The number of unique editors that have contributed to a given article within a Wikimedia project. Ideally, this data point would then be able to be split based on the type of editor; for example, a community bot, a logged in user, or an anonymous user.

Tue, May 12, 4:22 PM · DPE-MediaWiki-Incremental-History, Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Mon, May 11

AKhatun_WMF moved T425569: `mw_content_reconcile_mw_content_history_daily`: NoClassDefFoundError(EventStreamFactory) in spark_emit_reconcile_events_to_kafka from Next Up to Done on the Data-Engineering (Q4 FS25/26 April 1st - June 30st) board.
Mon, May 11, 3:33 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

May 7 2026

AKhatun_WMF added a comment to T425569: `mw_content_reconcile_mw_content_history_daily`: NoClassDefFoundError(EventStreamFactory) in spark_emit_reconcile_events_to_kafka.

Upgraded to eventutilities-spark 1.4.6 with dependencies. Re-running did not work, since it is already reconciled now, no new event were being emitted to kafka, and the code path was not being run.

May 7 2026, 2:14 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

May 6 2026

AKhatun_WMF updated subscribers of T425569: `mw_content_reconcile_mw_content_history_daily`: NoClassDefFoundError(EventStreamFactory) in spark_emit_reconcile_events_to_kafka.

After some debugging with @JAllemandou we found that with ivy, all dependencies of eventutilities-spark were being downloaded automatically. But when we set artifacts explicitly with artifact("eventutilities-spark-1.4.1.jar"), the dependencies are not auto resolved. We need to add a -with-dependencies version of the jar. 1.4.1 does not have a jar -with-dependencies. Will try eventutilities-spark-1.4.6-shaded-with-dependencies.jar locally and create and MR if that works.

May 6 2026, 7:39 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
AKhatun_WMF renamed T425569: `mw_content_reconcile_mw_content_history_daily`: NoClassDefFoundError(EventStreamFactory) in spark_emit_reconcile_events_to_kafka from `mw_content_reconcile_*`: NoClassDefFoundError(EventStreamFactory) in spark_emit_reconcile_events_to_kafka to `mw_content_reconcile_mw_content_history_daily`: NoClassDefFoundError(EventStreamFactory) in spark_emit_reconcile_events_to_kafka.
May 6 2026, 3:42 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
AKhatun_WMF created T425569: `mw_content_reconcile_mw_content_history_daily`: NoClassDefFoundError(EventStreamFactory) in spark_emit_reconcile_events_to_kafka.
May 6 2026, 3:42 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

May 5 2026

AKhatun_WMF added a comment to T422030: Surge in webrequest validation check.

From current Ops Week:

  • The ERROR emails have stopped.
  • We are still getting WARNING emails quite frequently:
    • refine_webrequest_hourly_text, pretty much every hour.
    • refine_webrequest_hourly_upload ~10 in the last 2 days.
May 5 2026, 8:40 PM · Patch-For-Review, Data-Platform-SRE (2026-03-27 - 2026-04-17), Data-Engineering (Q4 FS25/26 April 1st - June 30st), Traffic
AKhatun_WMF renamed T425443: Mediawiki History Failure [2026-04] from Mediawiki History Failure to Mediawiki History Failure [2026-04].
May 5 2026, 7:54 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
AKhatun_WMF updated the task description for T425443: Mediawiki History Failure [2026-04].
May 5 2026, 7:52 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
AKhatun_WMF added a comment to T425443: Mediawiki History Failure [2026-04].

Slack thread

May 5 2026, 7:50 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
AKhatun_WMF updated the task description for T425443: Mediawiki History Failure [2026-04].
May 5 2026, 7:38 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
AKhatun_WMF added a project to T425384: imagelinks table missing in pplwiki, several other tables missing in testcommonswiki: Data-Persistence.
May 5 2026, 5:34 PM · Data-Persistence, Data-Services, cloud-services-team
AKhatun_WMF created T425443: Mediawiki History Failure [2026-04].
May 5 2026, 4:50 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
AKhatun_WMF updated the task description for T425385: Inconsistent wiki list: grouped_wikis.csv extended *after* some sqoop jobs have already started.
May 5 2026, 2:31 AM · Essential-Work, Data-Engineering (Q4 FS25/26 April 1st - June 30st)
AKhatun_WMF created T425385: Inconsistent wiki list: grouped_wikis.csv extended *after* some sqoop jobs have already started.
May 5 2026, 2:31 AM · Essential-Work, Data-Engineering (Q4 FS25/26 April 1st - June 30st)
AKhatun_WMF created T425384: imagelinks table missing in pplwiki, several other tables missing in testcommonswiki.
May 5 2026, 1:46 AM · Data-Persistence, Data-Services, cloud-services-team

May 4 2026

AKhatun_WMF moved T425362: Edit type enrichment: Watermarks lagging from Next Up to Done on the Data-Engineering (Q4 FS25/26 April 1st - June 30st) board.
May 4 2026, 10:47 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF added a comment to T418804: table_maintenance_iceberg_monthly permission issue fails task due to permission on Ivy cache artifact.

Same problem today for maintenance dags in main airflow

Exception in thread "main" java.io.FileNotFoundException: /tmp/table_maintenance_iceberg_monthly/ivy_spark3/cache/resolved-org.apache.spark-spark-submit-parent-7071631f-d152-48b4-bb0f-788ee707e4d1-1.0.xml (Permission denied)
May 4 2026, 4:57 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
AKhatun_WMF added a comment to T425362: Edit type enrichment: Watermarks lagging.

@Ottomata Sorry, I can't fully remember what happened when DC switchover happened. Wanted to confirm, when DC switchover does happen, page_change_v1 will have events in codfw. html_content_change will consume from codfw.page_change.v1 but output to eqiad.page_html_content_change.v1, correct?
Then page_html_feature_counts_change will always ingest from eqiad and output to eqiad.

May 4 2026, 4:09 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF created T425362: Edit type enrichment: Watermarks lagging.
May 4 2026, 3:53 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform

May 1 2026

AKhatun_WMF updated the task description for T424364: Delete old edit-type stream.
May 1 2026, 10:03 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF moved T410940: WE1.5.3 Productize Data for Monthly Active Moderator Actions from In progress to Done on the Data-Engineering (Q4 FS25/26 April 1st - June 30st) board.
May 1 2026, 9:57 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), OKR-Work (WE1 FY2025-26)
AKhatun_WMF added a comment to T410940: WE1.5.3 Productize Data for Monthly Active Moderator Actions.

The hypothesis is now complete. Final update can be found here: https://app.asana.com/0/0/1214459375535326

May 1 2026, 9:55 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), OKR-Work (WE1 FY2025-26)
AKhatun_WMF updated the task description for T351225: Productionized Edit Types.
May 1 2026, 4:40 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Patch-For-Review, Research-Freezer, Event-Platform, Research-engineering
AKhatun_WMF moved T424918: Validate new streaming edit-type dataset with historical research edit-type dataset from Next Up to Done on the Data-Engineering (Q4 FS25/26 April 1st - June 30st) board.
May 1 2026, 4:38 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF moved T424225: Edit type Enrichment: Update documentation from Next Up to Done on the Data-Engineering (Q4 FS25/26 April 1st - June 30st) board.
May 1 2026, 4:30 AM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF updated the task description for T424225: Edit type Enrichment: Update documentation.
May 1 2026, 4:29 AM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform

Apr 30 2026

AKhatun_WMF updated the task description for T423920: Streaming HTML & Edit Types - productionization checklist.
Apr 30 2026, 3:25 PM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF moved T424224: Edit type enrichment: Alerting from Next Up to Done on the Data-Engineering (Q4 FS25/26 April 1st - June 30st) board.
Apr 30 2026, 2:57 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF updated the task description for T424224: Edit type enrichment: Alerting.
Apr 30 2026, 2:56 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF updated subscribers of T424918: Validate new streaming edit-type dataset with historical research edit-type dataset.

@CMyrick-WMF: You should (if you haven't already) deduplicate on wiki_id, rev_id. As we've already noticed, other events (deletes/moves) can contain the same old rev_id, hence a produce duplicate of the edit-types. Plus, sometimes the same event is duplicated too, due to reprocessing for instance.

Apr 30 2026, 4:52 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF created T424918: Validate new streaming edit-type dataset with historical research edit-type dataset.
Apr 30 2026, 4:51 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF added a comment to T423920: Streaming HTML & Edit Types - productionization checklist.

What do we need to do to have these datasets in event_sanitized?

Apr 30 2026, 4:28 AM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform

Apr 29 2026

AKhatun_WMF added a comment to T424231: Edit type Enrichment: Update flink code.

So html pipeline

  • skips adding current html for create or delete
  • skips adding parent html for create or delete or parent_rev_id==0
Apr 29 2026, 4:52 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform

Apr 28 2026

AKhatun_WMF updated the task description for T424224: Edit type enrichment: Alerting.
Apr 28 2026, 8:07 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform

Apr 27 2026

AKhatun_WMF updated the task description for T423920: Streaming HTML & Edit Types - productionization checklist.
Apr 27 2026, 4:39 PM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF updated the task description for T424364: Delete old edit-type stream.
Apr 27 2026, 4:36 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF updated the task description for T424364: Delete old edit-type stream.
Apr 27 2026, 4:35 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF updated the task description for T424364: Delete old edit-type stream.
Apr 27 2026, 4:30 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF claimed T424225: Edit type Enrichment: Update documentation.
Apr 27 2026, 4:17 PM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF moved T424231: Edit type Enrichment: Update flink code from Next Up to Done on the Data-Engineering (Q4 FS25/26 April 1st - June 30st) board.
Apr 27 2026, 4:15 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF moved T424223: Edit type enrichment: Update stream to use new deployment config from Next Up to Done on the Data-Engineering (Q4 FS25/26 April 1st - June 30st) board.
Apr 27 2026, 4:14 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF moved T421919: Backfill newly productionized edit types dataset from In progress to Done on the Data-Engineering (Q4 FS25/26 April 1st - June 30st) board.
Apr 27 2026, 4:14 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)
AKhatun_WMF moved T421026: Debug edit type pipeline for production readiness from In progress to Done on the Data-Engineering (Q4 FS25/26 April 1st - June 30st) board.
Apr 27 2026, 4:14 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF updated the task description for T421026: Debug edit type pipeline for production readiness.
Apr 27 2026, 4:13 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF added a parent task for T424547: Edit type enrichment: Add timeout: T423920: Streaming HTML & Edit Types - productionization checklist.
Apr 27 2026, 4:12 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF added a subtask for T423920: Streaming HTML & Edit Types - productionization checklist: T424547: Edit type enrichment: Add timeout.
Apr 27 2026, 4:12 PM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF created T424547: Edit type enrichment: Add timeout.
Apr 27 2026, 4:02 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform

Apr 25 2026

AKhatun_WMF attached a referenced file: F77183482: n-daily-users-who-inserted-messagebox-or-inline-cleanup-notes-2026-04-21T21-58-41.888Z.jpg.
Apr 25 2026, 2:38 AM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)

Apr 24 2026

AKhatun_WMF added a comment to T424231: Edit type Enrichment: Update flink code.

Yes, the html pipeline is just not adding the diff, the current html is present.

Apr 24 2026, 5:48 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF updated the task description for T424231: Edit type Enrichment: Update flink code.
Apr 24 2026, 5:35 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF updated the task description for T424223: Edit type enrichment: Update stream to use new deployment config.
Apr 24 2026, 5:34 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF added a comment to T424231: Edit type Enrichment: Update flink code.

@Ottomata, @JMonton-WMF
There are some events in error sink. Not a new error, there are events in old edit-type error sink with the same error messages. I've checked, all of them are because

	"$schema": "/error/2.1.0",
	"dt": "2026-04-23T22:02:40Z",
	"emitter_id": "mw-page-edit-type-enrich-next",
	"error_type": "ValueError",
	"errored_schema_uri": "/development/rendering_content_change/1.0.0",
	"errored_stream_name": "mediawiki.page_html_content_change.dev5",
	"message": "ValueError(\"Required field(s) missing or empty: 'delta.revision.rendering.content.content_body' (unified diff). Cannot proceed with enrichment for event (meta_id=67a12db2-44db-4942-9438-6fdae30a0537; rev_id=982629652; domain=en.wikipedia.org).\")",

(took a enwiki example from previous stream for convenience)
the incoming events delta is null. All of them are undelete events. And I have spot checked, these rev_ids don't have a parent rev_id (example revision). So this makes sense that the delta is null, we should be ok to ignore these. Wondering if we should handle these events, or letting them go to error sink is fine?

Apr 24 2026, 5:30 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF updated the task description for T424364: Delete old edit-type stream.
Apr 24 2026, 4:17 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF created T424364: Delete old edit-type stream.
Apr 24 2026, 4:01 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform

Apr 23 2026

AKhatun_WMF updated the task description for T424231: Edit type Enrichment: Update flink code.
Apr 23 2026, 10:08 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF updated subscribers of T424223: Edit type enrichment: Update stream to use new deployment config.

With the new stream mediawiki.page_html_feature_counts_change.rc0 declared https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1276397, requesting SRE help to

Apr 23 2026, 10:04 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF renamed T424223: Edit type enrichment: Update stream to use new deployment config from Update edit type stream to use new deployment config to Edit type enrichment: Update stream to use new deployment config.
Apr 23 2026, 2:53 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF created T424231: Edit type Enrichment: Update flink code.
Apr 23 2026, 2:51 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF created T424225: Edit type Enrichment: Update documentation.
Apr 23 2026, 1:32 PM · Patch-For-Review, Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF updated the task description for T424224: Edit type enrichment: Alerting.
Apr 23 2026, 1:31 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF created T424224: Edit type enrichment: Alerting.
Apr 23 2026, 1:30 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform
AKhatun_WMF created T424223: Edit type enrichment: Update stream to use new deployment config.
Apr 23 2026, 1:29 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st), Event-Platform

Apr 21 2026

AKhatun_WMF added a comment to T421919: Backfill newly productionized edit types dataset.

Backfill is now complete. akhatun.edit_type_v3 contains edit-type data from ns0 and just Wikipedias. Uses mwedittypes v3.1.0 and mwparserfromhtml v2.1.1.

Apr 21 2026, 9:51 PM · Data-Engineering (Q4 FS25/26 April 1st - June 30st)