Page MenuHomePhabricator

Snwachukwu (Sandra Ebele Nwachukwu)
User

Today

  • No visible events.

Tomorrow

  • No visible events.

Thursday

  • No visible events.

User Details

User Since
Jan 6 2022, 11:29 AM (205 w, 4 d)
Availability
Available
LDAP User
Snwachukwu
MediaWiki User
SNwachukwu (WMF) [ Global Accounts ]

Recent Activity

Mon, Dec 8

Snwachukwu added a comment to T409601: Review and productionize the WME differential privacy data set.

@Htriedman I created an MR to add the DDL Scripts to your wmepageview repo. Please review. I have added it to your repo and afterwards we can update the production repository.

Mon, Dec 8, 7:50 PM · Patch-For-Review, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu updated the task description for T412035: Upgrade Airflow HdfsEmailOperator to take both a String or a List(String) email addresses..
Mon, Dec 8, 4:38 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu created T412035: Upgrade Airflow HdfsEmailOperator to take both a String or a List(String) email addresses..
Mon, Dec 8, 4:37 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Fri, Dec 5

Snwachukwu claimed T411876: Add new data-steward email to Human-Bot Alert email..
Fri, Dec 5, 4:47 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu created T411876: Add new data-steward email to Human-Bot Alert email..
Fri, Dec 5, 4:47 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu added a comment to T409601: Review and productionize the WME differential privacy data set.

@Htriedman, The database is created and now I want to create the tables. I have prepared the create table statement. I just need you to confirm the column comments before i create them. See patch

Fri, Dec 5, 4:14 PM · Patch-For-Review, Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Wed, Dec 3

Snwachukwu added a comment to T409601: Review and productionize the WME differential privacy data set.

To point (1) — in WME datasets, the page_id is referred to as identifier (inside a JSON object). Because this dataset is going to be used in a WME context, I plan to stick with that convention.

In that case should we rename the page_id column in the hourly table pageview_hourly_proportion to identifier for consistency?

Wed, Dec 3, 7:30 PM · Patch-For-Review, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu added a comment to T409601: Review and productionize the WME differential privacy data set.

Thanks @Htriedman for all your response. I'm about creating the tables in the newly created database. I was wondering, Do you have a folder for create table statements? I can't seem to find any in your repos.

Wed, Dec 3, 6:53 PM · Patch-For-Review, Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Tue, Dec 2

Snwachukwu added a comment to T409601: Review and productionize the WME differential privacy data set.

@Htriedman, Regarding the request to move the code to a non-user repo, I'm a little confused, It seems it has already been moved because I can see a similar repo in WME directory here. Can I say that this task:

Tue, Dec 2, 6:34 PM · Patch-For-Review, Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Mon, Dec 1

Snwachukwu added a comment to T409601: Review and productionize the WME differential privacy data set.

Regarding the schema @Htriedman , I took alook at the hourly job output table schemas, and I noticed the following:

  1. The table htriedman.pageview_combined_analytics has a column identifier which is the page_id. This column should be named page_id to be consistent with other tables.
  2. I noticed namespace_id is on neither of the tables. I would suggest it is added as it is part of a page metadata.
Mon, Dec 1, 8:11 PM · Patch-For-Review, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu closed T410289: Develop HQL Scripts for Creating Global Editor Metrics Cassandra Tables in Hive., a subtask of T405039: Global Editor Metrics - Data Pipeline, as Resolved.
Mon, Dec 1, 8:10 PM · Patch-For-Review, Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data
Snwachukwu closed T410289: Develop HQL Scripts for Creating Global Editor Metrics Cassandra Tables in Hive. as Resolved.
Mon, Dec 1, 8:10 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu added a comment to T409601: Review and productionize the WME differential privacy data set.

Hi @Htriedman . Can you help me confirm that these are all the list of output tables that need to be moved to a non-user database?
Hourly updated tables:

  • htriedman.pageview_hourly_proportion
  • htriedman.pageview_combined_analytics

Daily updated tables:

  • htriedman.pageview_geo_distribution
  • htriedman.pageview_geo_top10

Monthly updated tables:

  • htriedman.pageview_associated_distribution
  • htriedman.pageview_associated_top10
Mon, Dec 1, 7:59 PM · Patch-For-Review, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu renamed T411378: Human vs Bot Alerting Email Upgrade from Human vs Bot ALERTING to Human vs Bot Alerting Email Upgrade.
Mon, Dec 1, 4:29 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu moved T411378: Human vs Bot Alerting Email Upgrade from Next Up to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Mon, Dec 1, 4:28 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu created T411378: Human vs Bot Alerting Email Upgrade.
Mon, Dec 1, 4:28 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Wed, Nov 26

Snwachukwu claimed T409601: Review and productionize the WME differential privacy data set.
Wed, Nov 26, 7:50 PM · Patch-For-Review, Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Mon, Nov 17

Snwachukwu claimed T410289: Develop HQL Scripts for Creating Global Editor Metrics Cassandra Tables in Hive..
Mon, Nov 17, 6:39 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu added a subtask for T405039: Global Editor Metrics - Data Pipeline: T410289: Develop HQL Scripts for Creating Global Editor Metrics Cassandra Tables in Hive..
Mon, Nov 17, 6:39 PM · Patch-For-Review, Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data
Snwachukwu added a parent task for T410289: Develop HQL Scripts for Creating Global Editor Metrics Cassandra Tables in Hive.: T405039: Global Editor Metrics - Data Pipeline.
Mon, Nov 17, 6:39 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu changed the status of T410289: Develop HQL Scripts for Creating Global Editor Metrics Cassandra Tables in Hive. from Open to In Progress.
Mon, Nov 17, 4:55 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu created T410289: Develop HQL Scripts for Creating Global Editor Metrics Cassandra Tables in Hive..
Mon, Nov 17, 4:54 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu moved T401022: Implement the data layout, UI, and documentation for the XML file export from In progress to Done on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Mon, Nov 17, 4:34 PM · Patch-For-Review, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu moved T408918: Upgrade mediawiki-event-enrichment jobs to Flink 1.20.2 and Java 17 from In progress to Blocked/Paused on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Mon, Nov 17, 4:32 PM · Patch-For-Review, Event-Platform, Data-Engineering (Q2 FY25/26 October 1st - December 31th), Essential-Work
Snwachukwu moved T407239: SDS 1.3.2 Implementation from In Review to Done on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Mon, Nov 17, 4:29 PM · Patch-For-Review, OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu moved T409099: Iceberg Merge strategies with dbt from In Review to Done on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Mon, Nov 17, 4:22 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Nov 7 2025

Snwachukwu added a comment to T397076: Re-enable WMF-NDA access for Miriam and Snwachukwu.

Thank you @mmartorana

Nov 7 2025, 8:46 PM · SecTeam-Processed, Security, Security-Team

Nov 3 2025

Snwachukwu added a comment to T397076: Re-enable WMF-NDA access for Miriam and Snwachukwu.

@mmartorana Oh Please I'd like my access back if its not too late

Nov 3 2025, 4:07 PM · SecTeam-Processed, Security, Security-Team

Oct 31 2025

Snwachukwu added a comment to T407239: SDS 1.3.2 Implementation.

@xcollazo . I have linked it.

Oct 31 2025, 5:29 PM · Patch-For-Review, OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu updated the task description for T407239: SDS 1.3.2 Implementation.
Oct 31 2025, 5:25 PM · Patch-For-Review, OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Oct 29 2025

Snwachukwu added a comment to T406882: SDS 1.3.2 Conduct Analysis on Alerting for changes in automated traffic distribution.

Yes, for clarity, Here is the approach decided on to define our thresholds:

  • Use Fix thresholds and revist them after a period of time (1 year suggested)
  • Use quantiles to define thresholds.
  • Current Quantiles values are gotten from using data between March 20th to Oct 15 2025.
Oct 29 2025, 1:54 PM · OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Oct 27 2025

Snwachukwu added a comment to T408400: Unable to change input dump path of Airflow commons_structured_data_dump_to_hive_weekly dag.

I'm using this ticket as an opportunity to perform the following upgrades on the airflow dag:

Oct 27 2025, 3:22 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu updated the task description for T408400: Unable to change input dump path of Airflow commons_structured_data_dump_to_hive_weekly dag.
Oct 27 2025, 3:15 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu claimed T408400: Unable to change input dump path of Airflow commons_structured_data_dump_to_hive_weekly dag.
Oct 27 2025, 3:09 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu moved T408400: Unable to change input dump path of Airflow commons_structured_data_dump_to_hive_weekly dag from Next Up to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 27 2025, 3:09 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu created T408400: Unable to change input dump path of Airflow commons_structured_data_dump_to_hive_weekly dag.
Oct 27 2025, 3:09 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Oct 21 2025

Snwachukwu added a comment to T406882: SDS 1.3.2 Conduct Analysis on Alerting for changes in automated traffic distribution.

@Hghani I think for thresholds, the question is should we use a fixed threshold or rolling threshold? I believe they both has advantages and disadvantages. Traffic changes over time. I suggested we use a mix of both.

Oct 21 2025, 3:46 PM · OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Oct 17 2025

Snwachukwu added a comment to T406882: SDS 1.3.2 Conduct Analysis on Alerting for changes in automated traffic distribution.

Do the quantile values capture the spikes before May 28th? I ask because May 28th until the first week of June was the absolute peak of the May incident.

Oct 17 2025, 3:02 PM · OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Oct 16 2025

Snwachukwu moved T365203: [Data Quality] Implement wiki completeness check for MediaWiki History from Ready to Deploy to Done on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 16 2025, 10:25 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Essential-Work

Oct 15 2025

Snwachukwu added a comment to T406882: SDS 1.3.2 Conduct Analysis on Alerting for changes in automated traffic distribution.

I applied the suggested thresholds above to the old incident data found in wmf.pageview_hourly_backup_2025 but unfortunately they didn't catch the changes in human-bot ratio. I think its either because the old table only has data from the time 2025-03-20 which this time the incident already started. If we had an older history data without the incident, it would help to create a more trustworthy baseline to calculate our diff.

Oct 15 2025, 8:27 PM · OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu changed the status of T406882: SDS 1.3.2 Conduct Analysis on Alerting for changes in automated traffic distribution, a subtask of T407235: SDS 1.3.2 [EPIC] Automated alerting for changes in automated traffic behavior, from Open to In Progress.
Oct 15 2025, 4:59 PM · Data-Engineering, Patch-For-Review, Epic, OKR-Work
Snwachukwu changed the status of T406882: SDS 1.3.2 Conduct Analysis on Alerting for changes in automated traffic distribution from Open to In Progress.
Oct 15 2025, 4:59 PM · OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu moved T406882: SDS 1.3.2 Conduct Analysis on Alerting for changes in automated traffic distribution from Next Up to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 15 2025, 4:57 PM · OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu updated the task description for T406882: SDS 1.3.2 Conduct Analysis on Alerting for changes in automated traffic distribution.
Oct 15 2025, 4:57 PM · OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu added a comment to T406882: SDS 1.3.2 Conduct Analysis on Alerting for changes in automated traffic distribution.

A quick summary of 5 weeks data:
History table used: Pageview hourly
Proposed Monitory/Alerting frequency : Daily

Oct 15 2025, 4:54 PM · OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Oct 14 2025

Snwachukwu moved T407235: SDS 1.3.2 [EPIC] Automated alerting for changes in automated traffic behavior from Next Up to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 14 2025, 5:22 PM · Data-Engineering, Patch-For-Review, Epic, OKR-Work
Snwachukwu created T407239: SDS 1.3.2 Implementation.
Oct 14 2025, 1:59 PM · Patch-For-Review, OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu renamed T407235: SDS 1.3.2 [EPIC] Automated alerting for changes in automated traffic behavior from SDS 1.3.2 EPIC : Automated alerting for changes in automated traffic behavior to SDS 1.3.2 [EPIC] Automated alerting for changes in automated traffic behavior.
Oct 14 2025, 1:33 PM · Data-Engineering, Patch-For-Review, Epic, OKR-Work
Snwachukwu added a parent task for T406882: SDS 1.3.2 Conduct Analysis on Alerting for changes in automated traffic distribution: T407235: SDS 1.3.2 [EPIC] Automated alerting for changes in automated traffic behavior.
Oct 14 2025, 1:33 PM · OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu added a subtask for T407235: SDS 1.3.2 [EPIC] Automated alerting for changes in automated traffic behavior: T406882: SDS 1.3.2 Conduct Analysis on Alerting for changes in automated traffic distribution.
Oct 14 2025, 1:33 PM · Data-Engineering, Patch-For-Review, Epic, OKR-Work
Snwachukwu created T407235: SDS 1.3.2 [EPIC] Automated alerting for changes in automated traffic behavior.
Oct 14 2025, 1:32 PM · Data-Engineering, Patch-For-Review, Epic, OKR-Work
Snwachukwu renamed T406882: SDS 1.3.2 Conduct Analysis on Alerting for changes in automated traffic distribution from Conduct Analysis on Alerting for changes in automated traffic distribution to SDS 1.3.2 Conduct Analysis on Alerting for changes in automated traffic distribution.
Oct 14 2025, 1:25 PM · OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Oct 9 2025

Snwachukwu added a comment to T406882: SDS 1.3.2 Conduct Analysis on Alerting for changes in automated traffic distribution.

Curent plan:

  1. Understand the variability of pageviews by monitoring the delta change of total pageviews in a day against a baseline over a period of time.
  2. Suggested baseline is the a floating average of pageviews over a period of 7 days.
  3. We monitor this delta for a month maybe and from this pick a threshold for users and bots pageviews
Oct 9 2025, 4:02 PM · OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu updated subscribers of T406882: SDS 1.3.2 Conduct Analysis on Alerting for changes in automated traffic distribution.
Oct 9 2025, 3:14 PM · OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu added a project to T406882: SDS 1.3.2 Conduct Analysis on Alerting for changes in automated traffic distribution: OKR-Work.
Oct 9 2025, 3:12 PM · OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu updated the task description for T406882: SDS 1.3.2 Conduct Analysis on Alerting for changes in automated traffic distribution.
Oct 9 2025, 3:12 PM · OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Snwachukwu created T406882: SDS 1.3.2 Conduct Analysis on Alerting for changes in automated traffic distribution.
Oct 9 2025, 3:07 PM · OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Oct 2 2025

Snwachukwu moved T365203: [Data Quality] Implement wiki completeness check for MediaWiki History from In Review to Ready to Deploy on the Data-Engineering (Q1 FY25/26 July 1st - September 30th) board.
Oct 2 2025, 12:58 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Essential-Work
Snwachukwu moved T365203: [Data Quality] Implement wiki completeness check for MediaWiki History from In progress to In Review on the Data-Engineering (Q1 FY25/26 July 1st - September 30th) board.
Oct 2 2025, 12:58 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Essential-Work

Sep 24 2025

Snwachukwu moved T365203: [Data Quality] Implement wiki completeness check for MediaWiki History from Next Up to In progress on the Data-Engineering (Q1 FY25/26 July 1st - September 30th) board.
Sep 24 2025, 1:59 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Essential-Work

Apr 3 2025

Snwachukwu moved T372855: Disable Data Platform Engineering generated graphite metrics and dashboards from Blocked/Paused to Done on the Data-Engineering (Q3 2025 January 1st - March 31th) board.
Apr 3 2025, 1:53 PM · Essential-Work, Data-Engineering (Q3 2025 January 1st - March 31th), Data Pipelines, Technical-Debt, SRE Observability (FY2024/2025-Q3), Observability-Metrics

Apr 1 2025

Snwachukwu added a comment to T372855: Disable Data Platform Engineering generated graphite metrics and dashboards.

@AndrewTavis_WMDE This is good news. Thank you so much for the effort put into this.

Apr 1 2025, 5:47 PM · Essential-Work, Data-Engineering (Q3 2025 January 1st - March 31th), Data Pipelines, Technical-Debt, SRE Observability (FY2024/2025-Q3), Observability-Metrics
Snwachukwu created T390727: Canary failure on airflow platform_eng intsance after migrating to Kubernetes.
Apr 1 2025, 1:50 PM · Data-Platform-SRE (2025.04.12 - 2025.05.02), Data-Engineering (Q4 2025 April 1st - June 30th)

Mar 19 2025

Snwachukwu updated subscribers of T365203: [Data Quality] Implement wiki completeness check for MediaWiki History.

Okay @Andrew. Thank you!

Mar 19 2025, 4:38 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Essential-Work
Snwachukwu added a comment to T365203: [Data Quality] Implement wiki completeness check for MediaWiki History.

To solve the missing wikis issue, we decided it's best to automate sqoop list. There are 3 source of truth in consideration:

  1. Canonical_data.wikis table (from Wikimedia NOC website). Note there is ongoing work to automate this table T339928
  2. Site_creation log website.
  3. Project_namespace_map table.
Mar 19 2025, 2:37 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Essential-Work

Mar 11 2025

Snwachukwu added a comment to T372855: Disable Data Platform Engineering generated graphite metrics and dashboards.

Hi @AndrewTavis_WMDE . Thank you for the confirmation. Please feel free to reach out if you need any form of support.

Mar 11 2025, 1:40 PM · Essential-Work, Data-Engineering (Q3 2025 January 1st - March 31th), Data Pipelines, Technical-Debt, SRE Observability (FY2024/2025-Q3), Observability-Metrics

Mar 10 2025

Snwachukwu added a comment to T372855: Disable Data Platform Engineering generated graphite metrics and dashboards.

@AndrewTavis_WMDE can we work with the date 28th March to finally disable the wikidata metric job in airflow?

Mar 10 2025, 3:15 PM · Essential-Work, Data-Engineering (Q3 2025 January 1st - March 31th), Data Pipelines, Technical-Debt, SRE Observability (FY2024/2025-Q3), Observability-Metrics

Mar 3 2025

Snwachukwu added a comment to T372855: Disable Data Platform Engineering generated graphite metrics and dashboards.

The api metrics had been disabled. The wikidata metrics is pending,

Mar 3 2025, 6:32 PM · Essential-Work, Data-Engineering (Q3 2025 January 1st - March 31th), Data Pipelines, Technical-Debt, SRE Observability (FY2024/2025-Q3), Observability-Metrics

Feb 28 2025

Snwachukwu moved T372855: Disable Data Platform Engineering generated graphite metrics and dashboards from In progress to Blocked/Paused on the Data-Engineering (Q3 2025 January 1st - March 31th) board.
Feb 28 2025, 4:18 PM · Essential-Work, Data-Engineering (Q3 2025 January 1st - March 31th), Data Pipelines, Technical-Debt, SRE Observability (FY2024/2025-Q3), Observability-Metrics

Feb 25 2025

Snwachukwu added a comment to T377352: [Update Pipeline] wikidata_coeditors.

@AndrewTavis_WMDE The plan is to go read-only (Graphite) by the end of Q3-FY24/25. We can hold off turning off the wikidata_metrics_to_graphite_daily_dag.py until towards the end of this quater.

Feb 25 2025, 3:15 PM · Wikidata Analytics (Radar/Epics/Stalled), Wikidata, Patch-For-Review, DPE Temporary Accounts (Sprint 1)

Feb 20 2025

Snwachukwu added a comment to T377352: [Update Pipeline] wikidata_coeditors.

Hi @AndrewTavis_WMDE, I'm just following up the wikidata_metrics_to_graphite_daily_dag.py. As part of T372855, Can we go ahead to turn this dag off?

Feb 20 2025, 11:32 PM · Wikidata Analytics (Radar/Epics/Stalled), Wikidata, Patch-For-Review, DPE Temporary Accounts (Sprint 1)

Feb 18 2025

Snwachukwu moved T383743: Identify Internal Users of MediaWiki Wikitext Tables from In progress to Done on the Data-Engineering (Q3 2025 January 1st - March 31th) board.
Feb 18 2025, 3:36 PM · Essential-Work, Data-Engineering (Q3 2025 January 1st - March 31th)
Snwachukwu moved T365203: [Data Quality] Implement wiki completeness check for MediaWiki History from Blocked/Paused to In progress on the Data-Engineering (Q3 2025 January 1st - March 31th) board.
Feb 18 2025, 3:36 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Essential-Work

Jan 29 2025

Snwachukwu added a comment to T383743: Identify Internal Users of MediaWiki Wikitext Tables .

There aren't any hive query or script I found using these tables. The dumps are currently used by Research and Platform Engineering teams.

Jan 29 2025, 9:15 PM · Essential-Work, Data-Engineering (Q3 2025 January 1st - March 31th)
Snwachukwu updated the task description for T383743: Identify Internal Users of MediaWiki Wikitext Tables .
Jan 29 2025, 5:16 PM · Essential-Work, Data-Engineering (Q3 2025 January 1st - March 31th)

Jan 25 2025

Snwachukwu updated the task description for T383743: Identify Internal Users of MediaWiki Wikitext Tables .
Jan 25 2025, 2:34 AM · Essential-Work, Data-Engineering (Q3 2025 January 1st - March 31th)
Snwachukwu added a comment to T383743: Identify Internal Users of MediaWiki Wikitext Tables .

I did some search on the repositories and have pulled up this analysis. It's still WIP in progress as I'm yet to get all tables/datasets that uses the mediawiki wikitext dumps.

Jan 25 2025, 2:33 AM · Essential-Work, Data-Engineering (Q3 2025 January 1st - March 31th)

Jan 17 2025

Snwachukwu moved T379768: geoeditors_edits_monthly dag and hql from In Progress to Ready to deploy on the DPE Temporary Accounts (Sprint 1) board.
Jan 17 2025, 5:41 PM · DPE Temporary Accounts (Sprint 1)

Nov 27 2024

Snwachukwu moved T379768: geoeditors_edits_monthly dag and hql from Tasked to In Progress on the DPE Temporary Accounts (Sprint 1) board.
Nov 27 2024, 3:37 PM · DPE Temporary Accounts (Sprint 1)
Snwachukwu moved T379769: Private geoeditors: monthly dag and hql from Tasked to In Progress on the DPE Temporary Accounts (Sprint 1) board.
Nov 27 2024, 3:37 PM · DPE Temporary Accounts (Sprint 1)

Nov 26 2024

Snwachukwu moved T377352: [Update Pipeline] wikidata_coeditors from In Review to Blocked / Paused on the DPE Temporary Accounts (Sprint 1) board.
Nov 26 2024, 4:22 AM · Wikidata Analytics (Radar/Epics/Stalled), Wikidata, Patch-For-Review, DPE Temporary Accounts (Sprint 1)
Snwachukwu claimed T379772: Geoeditors Druid load.
Nov 26 2024, 4:19 AM · DPE Temporary Accounts (Sprint 1)

Nov 19 2024

Snwachukwu claimed T379769: Private geoeditors: monthly dag and hql.
Nov 19 2024, 6:33 PM · DPE Temporary Accounts (Sprint 1)

Nov 18 2024

Snwachukwu updated the task description for T379768: geoeditors_edits_monthly dag and hql.
Nov 18 2024, 5:36 PM · DPE Temporary Accounts (Sprint 1)

Nov 14 2024

Snwachukwu claimed T379768: geoeditors_edits_monthly dag and hql.
Nov 14 2024, 5:08 PM · DPE Temporary Accounts (Sprint 1)

Nov 6 2024

Snwachukwu claimed T365203: [Data Quality] Implement wiki completeness check for MediaWiki History.
Nov 6 2024, 1:31 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Essential-Work

Nov 4 2024

Snwachukwu claimed T376752: Add cu_log_event and cu_private_event CheckUser tables to data lake.
Nov 4 2024, 3:11 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Privacy Engineering, CheckUser

Oct 31 2024

Snwachukwu moved T364398: Add MW table 'cu_log' to data lake from In Review to Done on the Data-Engineering (Q2 2024 October 1st - December 31th) board.
Oct 31 2024, 4:24 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Temporary accounts, Data-Platform
Snwachukwu moved T378342: [HAProxy transition] Deploy a staging airflow dag for webrequest refinement from In progress to In Review on the Data-Engineering (Q2 2024 October 1st - December 31th) board.
Oct 31 2024, 4:14 PM · Patch-For-Review, Data-Engineering (Q2 2024 October 1st - December 31th)
Snwachukwu moved T377130: Bump eventutilities to support flink 1.20 from Ready to Deploy to Done on the Data-Engineering (Q2 2024 October 1st - December 31th) board.
Oct 31 2024, 4:09 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Dumps 2.0 (Kanban Board), Discovery-Search (Current work)

Oct 30 2024

Snwachukwu added a comment to T364398: Add MW table 'cu_log' to data lake.

As requested by @jwang and @mpopov, September's data for cu_log is now available in the data lake at the wmf_raw.mediawiki_private_cu_log table. The cu_log table has been added to the list of tables to be sqooped monthly so expect to get monthly data as other tables.

Oct 30 2024, 4:03 PM · Data-Engineering (Q2 2024 October 1st - December 31th), Temporary accounts, Data-Platform

Oct 21 2024

Snwachukwu created T377770: [Update Pipeline] druid_load_editattemptstep.
Oct 21 2024, 5:41 PM · DPE Temporary Accounts (Sprint 1)

Oct 16 2024

Snwachukwu added a member for DPE Temporary Accounts: Snwachukwu.
Oct 16 2024, 3:55 PM
Snwachukwu claimed T377333: Set up Alerting for Data Quality dags in Airflow..
Oct 16 2024, 2:09 PM · Data-Engineering
Snwachukwu created T377333: Set up Alerting for Data Quality dags in Airflow..
Oct 16 2024, 2:08 PM · Data-Engineering

Oct 8 2024

Snwachukwu updated subscribers of T366836: Migrate Event Platform Schema Respositories to Gitlab.

The switchover has been done. The gerrit repositories are deprecated(set to read-only) and the schema servers have all been updated with the gitlab urls with @BTullis support.

Oct 8 2024, 4:54 PM · Data-Engineering (Q2 2024 October 1st - December 31th)

Oct 1 2024

Snwachukwu added a comment to T366836: Migrate Event Platform Schema Respositories to Gitlab.

We plan to do the switch in 1 week time i.e 8th October, 2024.
Data-Platform-SRE We would need your support to manage merging this patch next Tuesday, on 8th October. We need to make sure the existent checkout have its git origin changed. Please help confirm availability so I can proceed with notifying everyone of this date.

Oct 1 2024, 2:17 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
Snwachukwu added a comment to T366836: Migrate Event Platform Schema Respositories to Gitlab.

Plan for EventPlatform Schema Migration.

Oct 1 2024, 2:17 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
Snwachukwu added a project to T366836: Migrate Event Platform Schema Respositories to Gitlab: Data-Platform-SRE.
Oct 1 2024, 1:52 PM · Data-Engineering (Q2 2024 October 1st - December 31th)

Sep 25 2024

Snwachukwu added a comment to T366836: Migrate Event Platform Schema Respositories to Gitlab.

The following documents have been updated:

Sep 25 2024, 4:45 PM · Data-Engineering (Q2 2024 October 1st - December 31th)