Page MenuHomePhabricator
Feed Advanced Search

Jan 17 2023

EChetty moved T324757: When moving oozie webrequest-load to airflow/spark avoid the error-check corner case from To be discussed /To be estimated to Sprint 07 on the Data Pipelines board.
Jan 17 2023, 11:47 AM · Data-Engineering, Data Pipelines
EChetty moved T325213: Increase mypy coverage in airflow-dags from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:47 AM · Data-Engineering, Data Pipelines
EChetty moved T320860: Fix mediawiki-history page computation for deleted pages having the same title from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data-Engineering, Data Pipelines
EChetty moved T325103: Prune raw HDFS FSImages stored on HDFS from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data-Engineering, Data Pipelines
EChetty triaged T325185: [Airflow] Implement a NotebookOperator as High priority.
Jan 17 2023, 11:46 AM · Data-Engineering, Data Pipelines
EChetty moved T325185: [Airflow] Implement a NotebookOperator from To be discussed /To be estimated to Backlog on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data-Engineering, Data Pipelines
EChetty moved T325195: Set up a repository to generate packaged conda environments via CI for Jupyter notebooks from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data-Engineering, Data Pipelines
EChetty moved T326193: Airflow upgrade (refactor deb creation + version bump + switch to PostgreSQL) from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data Pipelines
EChetty moved T325611: Add TikTok's in-app browser to ua-parser library from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data-Engineering, Data Pipelines, Product-Analytics
EChetty moved T325306: Provide aggregated user device data per-country from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data-Engineering
EChetty moved T326330: Update sqoop for CheckUser table from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data Pipelines (Sprint 07), Data-Engineering-Planning, Patch-For-Review
EChetty moved T324486: [Migration] migrate simple oozie jobs from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data-Engineering, Data Pipelines
EChetty moved T58628: Non-mobile UAs on mobile (2g/gprs, etc) IP-blocks from To be discussed /To be estimated to Discussed (Radar) on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data-Engineering, Data-Engineering-Wikistats
EChetty set the point value for T311229: Drop MediaViewer and MultimediaViewer* tables to 1.
Jan 17 2023, 11:45 AM · Data-Engineering, Data Pipelines
EChetty set the point value for T324757: When moving oozie webrequest-load to airflow/spark avoid the error-check corner case to 3.
Jan 17 2023, 11:45 AM · Data-Engineering, Data Pipelines
EChetty moved T325181: Present "Notebooks in Airflow" solution to PA and discuss ownership of different steps from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:45 AM · Data-Engineering, Product-Analytics
EChetty moved T323905: [M] Automate Airflow DAG release from To be discussed /To be estimated to Discussed (Radar) on the Data Pipelines board.
Jan 17 2023, 11:44 AM · Structured-Data-Backlog (Current Work), Data Pipelines
EChetty set the point value for T325103: Prune raw HDFS FSImages stored on HDFS to 1.
Jan 17 2023, 11:44 AM · Data-Engineering, Data Pipelines
EChetty set the point value for T325195: Set up a repository to generate packaged conda environments via CI for Jupyter notebooks to 5.
Jan 17 2023, 11:44 AM · Data-Engineering, Data Pipelines
EChetty set the point value for T325213: Increase mypy coverage in airflow-dags to 3.
Jan 17 2023, 11:43 AM · Data-Engineering, Data Pipelines
EChetty set the point value for T325611: Add TikTok's in-app browser to ua-parser library to 3.
Jan 17 2023, 11:42 AM · Data-Engineering, Data Pipelines, Product-Analytics
EChetty set the point value for T326330: Update sqoop for CheckUser table to 3.
Jan 17 2023, 11:41 AM · Data Pipelines (Sprint 07), Data-Engineering-Planning, Patch-For-Review
EChetty moved T323456: NEW FEATURE REQUEST: sqoop (all) user properties from mariadb to wmf_raw.mediawiki_user_properties from Sprint 05-06 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 11:41 AM · Data-Engineering, Data Pipelines
EChetty moved T322036: Implement periodical cleaning of Airflow databases from Sprint 05-06 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 11:41 AM · Data-Platform-SRE, Data-Engineering
EChetty moved T316896: Review why total_edits on Mediawiki_History differs from the total_edits on Editors_Daily from Sprint 05-06 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 11:41 AM · Data-Engineering, Data Pipelines, Product-Analytics
EChetty moved T318346: Add Python Linter Checks to CI from Sprint 05-06 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 11:41 AM · Patch-For-Review, Data Pipelines (Sprint 14), Data-Engineering-Planning
EChetty moved T321838: Back-fill Wikidata reliability Graphite metrics from Sprint 05-06 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 11:41 AM · Data-Engineering, Data Pipelines
EChetty moved T307508: Migrate 1+ Druid load jobs from Sprint 05-06 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 11:40 AM · Data Pipelines, Data-Engineering
EChetty moved T324488: [SPIKE] Webrequest migration from Sprint 05-06 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 11:40 AM · Data-Engineering, Data Pipelines, Spike
EChetty moved T326658: Document Impact of Jan 8&9 Traffic Data Loss from Sprint 05-06 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 11:40 AM · Data Pipelines (Sprint 08), SRE, Traffic
EChetty moved T327074: Update wmf.webrequest table to use a new column for referer data. from Sprint 05-06 to Sprint 07 on the Data Pipelines board.
Jan 17 2023, 11:40 AM · Patch-For-Review, Data Pipelines (Sprint 08), Metrics Platform Backlog, Foundational Technology Requests
EChetty closed T301403: Investigate wikimedia and wikidata unique devices per-project-family overcount offset as Resolved.
Jan 17 2023, 11:39 AM · Data Pipelines (Sprint 05-06), Data-Engineering-Planning, Product-Analytics
EChetty closed T323951: Table Cleanup - Drop Unused tables as Resolved.
Jan 17 2023, 11:39 AM · Data Pipelines (Sprint 05-06)
EChetty closed T321167: Prepare the fsimage , a subtask of T261283: Productionize HDFS fsimage data analysis job, as Resolved.
Jan 17 2023, 11:39 AM · Data-Engineering, Data Pipelines, Patch-For-Review, Technical-Debt
EChetty closed T321167: Prepare the fsimage as Resolved.
Jan 17 2023, 11:39 AM · Data Pipelines (Sprint 05-06), Patch-For-Review, Technical-Debt
EChetty closed T321168: Create and deploy the fsimage job., a subtask of T261283: Productionize HDFS fsimage data analysis job, as Resolved.
Jan 17 2023, 11:38 AM · Data-Engineering, Data Pipelines, Patch-For-Review, Technical-Debt
EChetty closed T321168: Create and deploy the fsimage job. as Resolved.
Jan 17 2023, 11:38 AM · Data Pipelines (Sprint 05-06), Patch-For-Review, Technical-Debt
EChetty closed T321169: Create a dashboard from the fsImage Dataset extracted from the HDFS FsImage, a subtask of T261283: Productionize HDFS fsimage data analysis job, as Resolved.
Jan 17 2023, 11:38 AM · Data-Engineering, Data Pipelines, Patch-For-Review, Technical-Debt
EChetty closed T321169: Create a dashboard from the fsImage Dataset extracted from the HDFS FsImage as Resolved.
Jan 17 2023, 11:38 AM · Data Pipelines (Sprint 05-06), Patch-For-Review, Technical-Debt
EChetty closed T322534: Spike: Product Analytics ETL options - Timebox 1 Sprint., a subtask of T322532: Notebook Scheduler for Product Analytics, as Resolved.
Jan 17 2023, 11:38 AM · Data-Engineering, Epic, Data Pipelines
EChetty closed T322534: Spike: Product Analytics ETL options - Timebox 1 Sprint. as Resolved.
Jan 17 2023, 11:38 AM · Data Pipelines (Sprint 05-06)
EChetty closed T324850: HDFS data usage pipeline change following deployment (Airflow has no access to hdfs.keytab), a subtask of T261283: Productionize HDFS fsimage data analysis job, as Resolved.
Jan 17 2023, 11:38 AM · Data-Engineering, Data Pipelines, Patch-For-Review, Technical-Debt
EChetty closed T324850: HDFS data usage pipeline change following deployment (Airflow has no access to hdfs.keytab) as Resolved.
Jan 17 2023, 11:38 AM · Data Pipelines (Sprint 05-06)
EChetty moved T323951: Table Cleanup - Drop Unused tables from Incident/Unexpected work to Done on the Data Pipelines (Sprint 05-06) board.
Jan 17 2023, 11:37 AM · Data Pipelines (Sprint 05-06)
EChetty moved T323951: Table Cleanup - Drop Unused tables from Done to Incident/Unexpected work on the Data Pipelines (Sprint 05-06) board.
Jan 17 2023, 11:37 AM · Data Pipelines (Sprint 05-06)
EChetty moved T326339: Use uap-core browser-family for bot detection from To be discussed /To be estimated to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 10:52 AM · Data-Engineering, Data Pipelines

Jan 16 2023

EChetty triaged T320860: Fix mediawiki-history page computation for deleted pages having the same title as Low priority.
Jan 16 2023, 4:50 PM · Data-Engineering, Data Pipelines
EChetty set the point value for T323662: NEW FEATURE REQUEST: Dataset with active and non-active Wikis to 5.
Jan 16 2023, 4:46 PM · Data-Engineering, Data Pipelines
EChetty set the point value for T325185: [Airflow] Implement a NotebookOperator to 1.
Jan 16 2023, 4:39 PM · Data-Engineering, Data Pipelines
EChetty set the point value for T325306: Provide aggregated user device data per-country to 9.
Jan 16 2023, 4:36 PM · Data-Engineering
EChetty set the point value for T326193: Airflow upgrade (refactor deb creation + version bump + switch to PostgreSQL) to 5.
Jan 16 2023, 4:29 PM · Data Pipelines
EChetty set the point value for T326339: Use uap-core browser-family for bot detection to 5.
Jan 16 2023, 4:27 PM · Data-Engineering, Data Pipelines
EChetty set the point value for T327072: Java Prep for Webrequest Load to 9.
Jan 16 2023, 4:22 PM · Patch-For-Review, Data Pipelines (sprint 10)
EChetty set the point value for T324485: [Airflow] Migrate Druid loading Oozie jobs - Parent task to 9.
Jan 16 2023, 4:18 PM · Data Pipelines (Sprint 14)
EChetty moved T309996: [Airflow] Build Druid Operator from Next Up to In Progress on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:15 PM · Data Pipelines (Sprint 08), Data-Engineering-Planning
EChetty moved T326195: Edit puppet code to provide Airflow the PostgreSQL connection from Ready to In Review on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:14 PM · Data Pipelines (Sprint 11)
EChetty moved T309769: Expanding External Referrer Tracking from Next Up to In Progress on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:14 PM · Data Pipelines (Sprint 08), Metrics Platform Backlog, Foundational Technology Requests
EChetty moved T309552: Update Airflow DAGs code to make it compatible with version V2.3.4 of Airflow from Ready to In Review on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:14 PM · Data Pipelines (Sprint 07), Data-Engineering-Planning
EChetty moved T309769: Expanding External Referrer Tracking from Ready to Next Up on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:14 PM · Data Pipelines (Sprint 08), Metrics Platform Backlog, Foundational Technology Requests
EChetty moved T309996: [Airflow] Build Druid Operator from Ready to Next Up on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:14 PM · Data Pipelines (Sprint 08), Data-Engineering-Planning
EChetty moved T315580: Upgrade Puppet code to make Airflow configuration files compatible with version 2.5.0 from Ready to In Review on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:14 PM · Data Pipelines (Sprint 11), Vuln-VulnComponent, SecTeam-Processed, Data-Engineering-Planning
EChetty moved T324995: Include EU Registered Country in the canonical country database from Ready to In Review on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:14 PM · Product-Analytics (Kanban), Data Pipelines (Sprint 07), Data-Engineering-Planning
EChetty moved T324485: [Airflow] Migrate Druid loading Oozie jobs - Parent task from Ready to Next Up on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:13 PM · Data Pipelines (Sprint 14)
EChetty moved T324483: [Migration] Pageview - Learning from Ready to Next Up on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:13 PM · Data Pipelines (Sprint 08)
EChetty moved T327072: Java Prep for Webrequest Load from Ready to Next Up on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:13 PM · Patch-For-Review, Data Pipelines (sprint 10)
EChetty moved T323614: [M] Reduce image_suggestion HDFS files footprint from Sprint 05-06 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 16 2023, 4:13 PM · Data Pipelines, Structured-Data-Backlog (Current Work), Image-Suggestions
EChetty moved T326195: Edit puppet code to provide Airflow the PostgreSQL connection from Sprint 05-06 to Sprint 07 on the Data Pipelines board.
Jan 16 2023, 4:12 PM · Data Pipelines (Sprint 11)
EChetty moved T315580: Upgrade Puppet code to make Airflow configuration files compatible with version 2.5.0 from Sprint 05-06 to Sprint 07 on the Data Pipelines board.
Jan 16 2023, 4:11 PM · Data Pipelines (Sprint 11), Vuln-VulnComponent, SecTeam-Processed, Data-Engineering-Planning
EChetty moved T326195: Edit puppet code to provide Airflow the PostgreSQL connection from Done to In Review on the Data Pipelines (Sprint 05-06) board.
Jan 16 2023, 4:11 PM · Data Pipelines (Sprint 11)
EChetty moved T326195: Edit puppet code to provide Airflow the PostgreSQL connection from In Progress to Done on the Data Pipelines (Sprint 05-06) board.
Jan 16 2023, 4:11 PM · Data Pipelines (Sprint 11)
EChetty moved T309552: Update Airflow DAGs code to make it compatible with version V2.3.4 of Airflow from Sprint 05-06 to Sprint 07 on the Data Pipelines board.
Jan 16 2023, 4:10 PM · Data Pipelines (Sprint 07), Data-Engineering-Planning
EChetty moved T309996: [Airflow] Build Druid Operator from Sprint 05-06 to Sprint 07 on the Data Pipelines board.
Jan 16 2023, 4:07 PM · Data Pipelines (Sprint 08), Data-Engineering-Planning
EChetty moved T309769: Expanding External Referrer Tracking from Sprint 05-06 to Sprint 07 on the Data Pipelines board.
Jan 16 2023, 4:07 PM · Data Pipelines (Sprint 08), Metrics Platform Backlog, Foundational Technology Requests
EChetty moved T324995: Include EU Registered Country in the canonical country database from Sprint 05-06 to Sprint 07 on the Data Pipelines board.
Jan 16 2023, 4:06 PM · Product-Analytics (Kanban), Data Pipelines (Sprint 07), Data-Engineering-Planning
EChetty created T327083: End to end test of the Java MPC.
Jan 16 2023, 3:25 PM · Metrics Platform Backlog (Metrics Platform Kanban)
EChetty moved T281773: Publish librarized Metrics Platform Java client to Maven from Ready to Next Up on the Metrics Platform Backlog (Metrics Platform Kanban) board.
Jan 16 2023, 3:23 PM · Metrics Platform Backlog (Metrics Platform Kanban), Product-Data-Infrastructure
EChetty moved T325240: Re-enable package cycle check in Java MPC from Ready to Next Up on the Metrics Platform Backlog (Metrics Platform Kanban) board.
Jan 16 2023, 3:22 PM · Metrics Platform Backlog (Metrics Platform Kanban)
EChetty assigned T327072: Java Prep for Webrequest Load to Antoine_Quhen.
Jan 16 2023, 2:44 PM · Patch-For-Review, Data Pipelines (sprint 10)
EChetty moved T327072: Java Prep for Webrequest Load from To be prioritised to Sprint 07 on the Data Pipelines board.
Jan 16 2023, 2:43 PM · Patch-For-Review, Data Pipelines (sprint 10)
EChetty moved T324483: [Migration] Pageview - Learning from To be prioritised to Sprint 07 on the Data Pipelines board.
Jan 16 2023, 2:43 PM · Data Pipelines (Sprint 08)
EChetty moved T324485: [Airflow] Migrate Druid loading Oozie jobs - Parent task from To be discussed /To be estimated to Sprint 07 on the Data Pipelines board.
Jan 16 2023, 2:42 PM · Data Pipelines (Sprint 14)
EChetty assigned T324485: [Airflow] Migrate Druid loading Oozie jobs - Parent task to mforns.
Jan 16 2023, 2:42 PM · Data Pipelines (Sprint 14)
EChetty updated Other Assignee for T327072: Java Prep for Webrequest Load, added: Antoine_Quhen.
Jan 16 2023, 2:42 PM · Patch-For-Review, Data Pipelines (sprint 10)
EChetty moved T327073: Write Airflow DAG to move the webrequest load job to airflow. from Backlog to To be prioritised on the Data Pipelines board.
Jan 16 2023, 2:41 PM · Data Pipelines (Sprint 11), Patch-For-Review
EChetty moved T327072: Java Prep for Webrequest Load from Backlog to To be prioritised on the Data Pipelines board.
Jan 16 2023, 2:41 PM · Patch-For-Review, Data Pipelines (sprint 10)
EChetty created T327073: Write Airflow DAG to move the webrequest load job to airflow..
Jan 16 2023, 2:41 PM · Data Pipelines (Sprint 11), Patch-For-Review
EChetty created T327072: Java Prep for Webrequest Load.
Jan 16 2023, 2:39 PM · Patch-For-Review, Data Pipelines (sprint 10)
EChetty assigned T324483: [Migration] Pageview - Learning to JAllemandou.
Jan 16 2023, 2:29 PM · Data Pipelines (Sprint 08)
EChetty assigned T324486: [Migration] migrate simple oozie jobs to mforns.
Jan 16 2023, 2:29 PM · Data-Engineering, Data Pipelines
EChetty archived Data Pipelines (Sprint 04).
Jan 16 2023, 1:59 PM
EChetty created Data Pipelines (Sprint 07).
Jan 16 2023, 1:59 PM

Jan 11 2023

EChetty moved T309769: Expanding External Referrer Tracking from In Review to In Progress on the Data Pipelines (Sprint 05-06) board.
Jan 11 2023, 5:05 PM · Data Pipelines (Sprint 08), Metrics Platform Backlog, Foundational Technology Requests
EChetty assigned T324011: SPIKE: Spin up a Test Trino instance (Evaluate Trino) to Stevemunene.
Jan 11 2023, 1:41 PM · Data-Platform-SRE
EChetty moved T324011: SPIKE: Spin up a Test Trino instance (Evaluate Trino) from Next Up to In Progress on the Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)) board.
Jan 11 2023, 1:40 PM · Data-Platform-SRE

Jan 10 2023

EChetty moved T286344: Remove StreamConfig::INTERNAL_SETTINGS logic from EventStreamConfig and do it in EventLogging client instead from Metrics Platform Kanban to Tracking on the Metrics Platform Backlog board.
Jan 10 2023, 4:10 PM · Data Engineering and Event Platform Team, Data-Engineering, Metrics Platform Backlog (Metrics Platform Kanban), MW-1.41-notes (1.41.0-wmf.2; 2023-03-27), MW-1.40-notes (1.40.0-wmf.24; 2023-02-20), Patch-For-Review, Event-Platform
EChetty moved T295619: Cookie value sent in HTTP requests changes too frequently from Metrics Platform Kanban to Tracking on the Metrics Platform Backlog board.
Jan 10 2023, 4:09 PM · Wikimedia-Performance-recommendation, Metrics Platform Backlog, MediaWiki-Platform-Team (Radar), Product-Analytics, MediaWiki-extensions-WikimediaEvents
EChetty moved T267408: [MEP Client Library] Write User-facing Documentation from Metrics Platform Kanban to Tracking on the Metrics Platform Backlog board.
Jan 10 2023, 4:05 PM · Metrics Platform Backlog, Documentation, Product-Data-Infrastructure, Better Use Of Data
EChetty moved T309013: EditAttemptStep Migration to (monoschema) MP from Ready to Deploy to Done on the Metrics Platform Backlog (Metrics Platform Kanban) board.
Jan 10 2023, 3:46 PM · Metrics Platform Backlog (Metrics Platform Kanban), MW-1.39-notes (1.39.0-wmf.25; 2022-08-15), Editing-team, DiscussionTools
EChetty moved T323458: NEW FEATURE REQUEST: Upgrade superset to 1.5.3 from In Review to Read to Deploy on the Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)) board.
Jan 10 2023, 2:09 PM · Patch-For-Review, Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-07)), Data-Engineering-Planning

Jan 9 2023

EChetty moved T323614: [M] Reduce image_suggestion HDFS files footprint from Ready to In Progress on the Data Pipelines (Sprint 05-06) board.
Jan 9 2023, 4:18 PM · Data Pipelines, Structured-Data-Backlog (Current Work), Image-Suggestions