Page MenuHomePhabricator
Feed Advanced Search

Jan 18 2023

EChetty created T327262: DSE Experiment - User Story 4 (Machine Learning Use Case).
Jan 18 2023, 11:54 AM · Shared-Data-Infrastructure, Epic
EChetty created T327259: Enable the Container Storage Interface (CSI) and the Ceph CSI plugin on dse-k8s cluster.
Jan 18 2023, 11:50 AM · Data-Platform-SRE (2024.06.17 - 2024.07.07), Patch-For-Review
EChetty created T327258: DSE Experiment - User Story 2 (Make Compute available).
Jan 18 2023, 11:48 AM · Shared-Data-Infrastructure, Epic
EChetty created T327257: DSE Experiment - User Story 1 (Address Kerberos).
Jan 18 2023, 11:46 AM · Shared-Data-Infrastructure, Epic

Jan 17 2023

EChetty moved T324485: [Airflow] Migrate Druid loading Oozie jobs - Parent task from Next Up to In Progress on the Data Pipelines (Sprint 07) board.
Jan 17 2023, 5:10 PM · Data Pipelines (Sprint 14)
EChetty moved T327074: Update wmf.webrequest table to use a new column for referer data. from Ready to In Progress on the Data Pipelines (Sprint 07) board.
Jan 17 2023, 5:08 PM · Patch-For-Review, Data Pipelines (Sprint 08), Metrics Platform Backlog, Foundational Technology Requests
EChetty moved T309769: Expanding External Referrer Tracking from In Progress to In Review on the Data Pipelines (Sprint 07) board.
Jan 17 2023, 5:08 PM · Data Pipelines (Sprint 08), Metrics Platform Backlog, Foundational Technology Requests
EChetty moved T326195: Edit puppet code to provide Airflow the PostgreSQL connection from In Review to In Progress on the Data Pipelines (Sprint 07) board.
Jan 17 2023, 5:08 PM · Data Pipelines (Sprint 11)
EChetty moved T311229: Drop MediaViewer and MultimediaViewer* tables from Sprint 07 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 5:01 PM · Data-Engineering, Data Pipelines
EChetty moved T326330: Update sqoop for CheckUser table from To be prioritised to Sprint 07 on the Data Pipelines board.
Jan 17 2023, 5:00 PM · Data Pipelines (Sprint 07), Data-Engineering-Planning, Patch-For-Review
EChetty added a comment to T323662: NEW FEATURE REQUEST: Dataset with active and non-active Wikis.

Do we have an existing definition of active we want to use here?

Jan 17 2023, 1:04 PM · Data-Engineering, Data Pipelines
EChetty moved T325306: Provide aggregated user device data per-country from To be prioritised to Discussed (Radar) on the Data Pipelines board.
Jan 17 2023, 12:53 PM · Data-Engineering
EChetty set the point value for T326195: Edit puppet code to provide Airflow the PostgreSQL connection to 3.
Jan 17 2023, 11:50 AM · Data Pipelines (Sprint 11)
EChetty moved T325181: Present "Notebooks in Airflow" solution to PA and discuss ownership of different steps from To be prioritised to Discussed (Radar) on the Data Pipelines board.
Jan 17 2023, 11:49 AM · Data-Engineering, Product-Analytics
EChetty claimed T325181: Present "Notebooks in Airflow" solution to PA and discuss ownership of different steps.
Jan 17 2023, 11:49 AM · Data-Engineering, Product-Analytics
EChetty triaged T325181: Present "Notebooks in Airflow" solution to PA and discuss ownership of different steps as High priority.
Jan 17 2023, 11:48 AM · Data-Engineering, Product-Analytics
EChetty set the point value for T327073: Write Airflow DAG to move the webrequest load job to airflow. to 3.
Jan 17 2023, 11:48 AM · Data Pipelines (Sprint 11), Patch-For-Review
EChetty moved T324482: [Migration] Oozie Migration jobs for Pageviews from To be prioritised to Sprint 07 on the Data Pipelines board.
Jan 17 2023, 11:48 AM · Data Pipelines (sprint 10), Patch-For-Review
EChetty set the point value for T324486: [Migration] migrate simple oozie jobs to 5.
Jan 17 2023, 11:47 AM · Data-Engineering, Data Pipelines
EChetty moved T311229: Drop MediaViewer and MultimediaViewer* tables from To be prioritised to Sprint 07 on the Data Pipelines board.
Jan 17 2023, 11:47 AM · Data-Engineering, Data Pipelines
EChetty moved T325103: Prune raw HDFS FSImages stored on HDFS from To be prioritised to Sprint 07 on the Data Pipelines board.
Jan 17 2023, 11:47 AM · Data-Engineering, Data Pipelines
EChetty moved T311229: Drop MediaViewer and MultimediaViewer* tables from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:47 AM · Data-Engineering, Data Pipelines
EChetty triaged T323662: NEW FEATURE REQUEST: Dataset with active and non-active Wikis as Medium priority.
Jan 17 2023, 11:47 AM · Data-Engineering, Data Pipelines
EChetty moved T323662: NEW FEATURE REQUEST: Dataset with active and non-active Wikis from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:47 AM · Data-Engineering, Data Pipelines
EChetty moved T324757: When moving oozie webrequest-load to airflow/spark avoid the error-check corner case from To be discussed /To be estimated to Sprint 07 on the Data Pipelines board.
Jan 17 2023, 11:47 AM · Data-Engineering, Data Pipelines
EChetty moved T325213: Increase mypy coverage in airflow-dags from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:47 AM · Data-Engineering, Data Pipelines
EChetty moved T320860: Fix mediawiki-history page computation for deleted pages having the same title from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data-Engineering, Data Pipelines
EChetty moved T325103: Prune raw HDFS FSImages stored on HDFS from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data-Engineering, Data Pipelines
EChetty triaged T325185: [Airflow] Implement a NotebookOperator as High priority.
Jan 17 2023, 11:46 AM · Data-Engineering, Data Pipelines
EChetty moved T325185: [Airflow] Implement a NotebookOperator from To be discussed /To be estimated to Backlog on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data-Engineering, Data Pipelines
EChetty moved T325195: Set up a repository to generate packaged conda environments via CI for Jupyter notebooks from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data-Engineering, Data Pipelines
EChetty moved T326193: Airflow upgrade (refactor deb creation + version bump + switch to PostgreSQL) from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data Pipelines
EChetty moved T325611: Add TikTok's in-app browser to ua-parser library from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data-Engineering, Data Pipelines, Product-Analytics
EChetty moved T325306: Provide aggregated user device data per-country from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data-Engineering
EChetty moved T326330: Update sqoop for CheckUser table from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data Pipelines (Sprint 07), Data-Engineering-Planning, Patch-For-Review
EChetty moved T324486: [Migration] migrate simple oozie jobs from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data-Engineering, Data Pipelines
EChetty moved T58628: Non-mobile UAs on mobile (2g/gprs, etc) IP-blocks from To be discussed /To be estimated to Discussed (Radar) on the Data Pipelines board.
Jan 17 2023, 11:46 AM · Data-Engineering, Data-Engineering-Wikistats
EChetty set the point value for T311229: Drop MediaViewer and MultimediaViewer* tables to 1.
Jan 17 2023, 11:45 AM · Data-Engineering, Data Pipelines
EChetty set the point value for T324757: When moving oozie webrequest-load to airflow/spark avoid the error-check corner case to 3.
Jan 17 2023, 11:45 AM · Data-Engineering, Data Pipelines
EChetty moved T325181: Present "Notebooks in Airflow" solution to PA and discuss ownership of different steps from To be discussed /To be estimated to To be prioritised on the Data Pipelines board.
Jan 17 2023, 11:45 AM · Data-Engineering, Product-Analytics
EChetty moved T323905: [M] Automate Airflow DAG release from To be discussed /To be estimated to Discussed (Radar) on the Data Pipelines board.
Jan 17 2023, 11:44 AM · Structured-Data-Backlog (Current Work), Data Pipelines
EChetty set the point value for T325103: Prune raw HDFS FSImages stored on HDFS to 1.
Jan 17 2023, 11:44 AM · Data-Engineering, Data Pipelines
EChetty set the point value for T325195: Set up a repository to generate packaged conda environments via CI for Jupyter notebooks to 5.
Jan 17 2023, 11:44 AM · Data-Engineering, Data Pipelines
EChetty set the point value for T325213: Increase mypy coverage in airflow-dags to 3.
Jan 17 2023, 11:43 AM · Data-Engineering, Data Pipelines
EChetty set the point value for T325611: Add TikTok's in-app browser to ua-parser library to 3.
Jan 17 2023, 11:42 AM · Data-Engineering, Data Pipelines, Product-Analytics
EChetty set the point value for T326330: Update sqoop for CheckUser table to 3.
Jan 17 2023, 11:41 AM · Data Pipelines (Sprint 07), Data-Engineering-Planning, Patch-For-Review
EChetty moved T323456: NEW FEATURE REQUEST: sqoop (all) user properties from mariadb to wmf_raw.mediawiki_user_properties from Sprint 05-06 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 11:41 AM · Data-Engineering, Data Pipelines
EChetty moved T322036: Implement periodical cleaning of Airflow databases from Sprint 05-06 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 11:41 AM · Data-Platform-SRE, Data-Engineering
EChetty moved T316896: Review why total_edits on Mediawiki_History differs from the total_edits on Editors_Daily from Sprint 05-06 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 11:41 AM · Data-Engineering, Data Pipelines, Product-Analytics
EChetty moved T318346: Add Python Linter Checks to CI from Sprint 05-06 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 11:41 AM · Patch-For-Review, Data Pipelines (Sprint 14), Data-Engineering-Planning
EChetty moved T321838: Back-fill Wikidata reliability Graphite metrics from Sprint 05-06 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 11:41 AM · Data-Engineering, Data Pipelines
EChetty moved T307508: Migrate 1+ Druid load jobs from Sprint 05-06 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 11:40 AM · Data Pipelines, Data-Engineering
EChetty moved T324488: [SPIKE] Webrequest migration from Sprint 05-06 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 11:40 AM · Data-Engineering, Data Pipelines, Spike
EChetty moved T326658: Document Impact of Jan 8&9 Traffic Data Loss from Sprint 05-06 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 11:40 AM · Data Pipelines (Sprint 08), SRE, Traffic
EChetty moved T327074: Update wmf.webrequest table to use a new column for referer data. from Sprint 05-06 to Sprint 07 on the Data Pipelines board.
Jan 17 2023, 11:40 AM · Patch-For-Review, Data Pipelines (Sprint 08), Metrics Platform Backlog, Foundational Technology Requests
EChetty closed T301403: Investigate wikimedia and wikidata unique devices per-project-family overcount offset as Resolved.
Jan 17 2023, 11:39 AM · Data Pipelines (Sprint 05-06), Data-Engineering-Planning, Product-Analytics
EChetty closed T323951: Table Cleanup - Drop Unused tables as Resolved.
Jan 17 2023, 11:39 AM · Data Pipelines (Sprint 05-06)
EChetty closed T321167: Prepare the fsimage , a subtask of T261283: Productionize HDFS fsimage data analysis job, as Resolved.
Jan 17 2023, 11:39 AM · Data-Engineering, Data Pipelines, Patch-For-Review, Technical-Debt
EChetty closed T321167: Prepare the fsimage as Resolved.
Jan 17 2023, 11:39 AM · Data Pipelines (Sprint 05-06), Patch-For-Review, Technical-Debt
EChetty closed T321168: Create and deploy the fsimage job., a subtask of T261283: Productionize HDFS fsimage data analysis job, as Resolved.
Jan 17 2023, 11:38 AM · Data-Engineering, Data Pipelines, Patch-For-Review, Technical-Debt
EChetty closed T321168: Create and deploy the fsimage job. as Resolved.
Jan 17 2023, 11:38 AM · Data Pipelines (Sprint 05-06), Patch-For-Review, Technical-Debt
EChetty closed T321169: Create a dashboard from the fsImage Dataset extracted from the HDFS FsImage, a subtask of T261283: Productionize HDFS fsimage data analysis job, as Resolved.
Jan 17 2023, 11:38 AM · Data-Engineering, Data Pipelines, Patch-For-Review, Technical-Debt
EChetty closed T321169: Create a dashboard from the fsImage Dataset extracted from the HDFS FsImage as Resolved.
Jan 17 2023, 11:38 AM · Data Pipelines (Sprint 05-06), Patch-For-Review, Technical-Debt
EChetty closed T322534: Spike: Product Analytics ETL options - Timebox 1 Sprint., a subtask of T322532: Notebook Scheduler for Product Analytics, as Resolved.
Jan 17 2023, 11:38 AM · Data-Engineering, Epic, Data Pipelines
EChetty closed T322534: Spike: Product Analytics ETL options - Timebox 1 Sprint. as Resolved.
Jan 17 2023, 11:38 AM · Data Pipelines (Sprint 05-06)
EChetty closed T324850: HDFS data usage pipeline change following deployment (Airflow has no access to hdfs.keytab), a subtask of T261283: Productionize HDFS fsimage data analysis job, as Resolved.
Jan 17 2023, 11:38 AM · Data-Engineering, Data Pipelines, Patch-For-Review, Technical-Debt
EChetty closed T324850: HDFS data usage pipeline change following deployment (Airflow has no access to hdfs.keytab) as Resolved.
Jan 17 2023, 11:38 AM · Data Pipelines (Sprint 05-06)
EChetty moved T323951: Table Cleanup - Drop Unused tables from Incident/Unexpected work to Done on the Data Pipelines (Sprint 05-06) board.
Jan 17 2023, 11:37 AM · Data Pipelines (Sprint 05-06)
EChetty moved T323951: Table Cleanup - Drop Unused tables from Done to Incident/Unexpected work on the Data Pipelines (Sprint 05-06) board.
Jan 17 2023, 11:37 AM · Data Pipelines (Sprint 05-06)
EChetty moved T326339: Use uap-core browser-family for bot detection from To be discussed /To be estimated to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 17 2023, 10:52 AM · Data-Engineering, Data Pipelines

Jan 16 2023

EChetty triaged T320860: Fix mediawiki-history page computation for deleted pages having the same title as Low priority.
Jan 16 2023, 4:50 PM · Data-Engineering, Data Pipelines
EChetty set the point value for T323662: NEW FEATURE REQUEST: Dataset with active and non-active Wikis to 5.
Jan 16 2023, 4:46 PM · Data-Engineering, Data Pipelines
EChetty set the point value for T325185: [Airflow] Implement a NotebookOperator to 1.
Jan 16 2023, 4:39 PM · Data-Engineering, Data Pipelines
EChetty set the point value for T325306: Provide aggregated user device data per-country to 9.
Jan 16 2023, 4:36 PM · Data-Engineering
EChetty set the point value for T326193: Airflow upgrade (refactor deb creation + version bump + switch to PostgreSQL) to 5.
Jan 16 2023, 4:29 PM · Data Pipelines
EChetty set the point value for T326339: Use uap-core browser-family for bot detection to 5.
Jan 16 2023, 4:27 PM · Data-Engineering, Data Pipelines
EChetty set the point value for T327072: Java Prep for Webrequest Load to 9.
Jan 16 2023, 4:22 PM · Patch-For-Review, Data Pipelines (sprint 10)
EChetty set the point value for T324485: [Airflow] Migrate Druid loading Oozie jobs - Parent task to 9.
Jan 16 2023, 4:18 PM · Data Pipelines (Sprint 14)
EChetty moved T309996: [Airflow] Build Druid Operator from Next Up to In Progress on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:15 PM · Data Pipelines (Sprint 08), Data-Engineering-Planning
EChetty moved T326195: Edit puppet code to provide Airflow the PostgreSQL connection from Ready to In Review on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:14 PM · Data Pipelines (Sprint 11)
EChetty moved T309769: Expanding External Referrer Tracking from Next Up to In Progress on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:14 PM · Data Pipelines (Sprint 08), Metrics Platform Backlog, Foundational Technology Requests
EChetty moved T309552: Update Airflow DAGs code to make it compatible with version V2.3.4 of Airflow from Ready to In Review on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:14 PM · Data Pipelines (Sprint 07), Data-Engineering-Planning
EChetty moved T309769: Expanding External Referrer Tracking from Ready to Next Up on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:14 PM · Data Pipelines (Sprint 08), Metrics Platform Backlog, Foundational Technology Requests
EChetty moved T309996: [Airflow] Build Druid Operator from Ready to Next Up on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:14 PM · Data Pipelines (Sprint 08), Data-Engineering-Planning
EChetty moved T315580: Upgrade Puppet code to make Airflow configuration files compatible with version 2.5.0 from Ready to In Review on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:14 PM · Data Pipelines (Sprint 11), Vuln-VulnComponent, SecTeam-Processed, Data-Engineering-Planning
EChetty moved T324995: Include EU Registered Country in the canonical country database from Ready to In Review on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:14 PM · Product-Analytics (Kanban), Data Pipelines (Sprint 07), Data-Engineering-Planning
EChetty moved T324485: [Airflow] Migrate Druid loading Oozie jobs - Parent task from Ready to Next Up on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:13 PM · Data Pipelines (Sprint 14)
EChetty moved T324483: [Migration] Pageview - Learning from Ready to Next Up on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:13 PM · Data Pipelines (Sprint 08)
EChetty moved T327072: Java Prep for Webrequest Load from Ready to Next Up on the Data Pipelines (Sprint 07) board.
Jan 16 2023, 4:13 PM · Patch-For-Review, Data Pipelines (sprint 10)
EChetty moved T323614: [M] Reduce image_suggestion HDFS files footprint from Sprint 05-06 to Next Up (revisit every 2 sprints) on the Data Pipelines board.
Jan 16 2023, 4:13 PM · Data Pipelines, Structured-Data-Backlog (Current Work), Image-Suggestions
EChetty moved T326195: Edit puppet code to provide Airflow the PostgreSQL connection from Sprint 05-06 to Sprint 07 on the Data Pipelines board.
Jan 16 2023, 4:12 PM · Data Pipelines (Sprint 11)
EChetty moved T315580: Upgrade Puppet code to make Airflow configuration files compatible with version 2.5.0 from Sprint 05-06 to Sprint 07 on the Data Pipelines board.
Jan 16 2023, 4:11 PM · Data Pipelines (Sprint 11), Vuln-VulnComponent, SecTeam-Processed, Data-Engineering-Planning
EChetty moved T326195: Edit puppet code to provide Airflow the PostgreSQL connection from Done to In Review on the Data Pipelines (Sprint 05-06) board.
Jan 16 2023, 4:11 PM · Data Pipelines (Sprint 11)
EChetty moved T326195: Edit puppet code to provide Airflow the PostgreSQL connection from In Progress to Done on the Data Pipelines (Sprint 05-06) board.
Jan 16 2023, 4:11 PM · Data Pipelines (Sprint 11)
EChetty moved T309552: Update Airflow DAGs code to make it compatible with version V2.3.4 of Airflow from Sprint 05-06 to Sprint 07 on the Data Pipelines board.
Jan 16 2023, 4:10 PM · Data Pipelines (Sprint 07), Data-Engineering-Planning
EChetty moved T309996: [Airflow] Build Druid Operator from Sprint 05-06 to Sprint 07 on the Data Pipelines board.
Jan 16 2023, 4:07 PM · Data Pipelines (Sprint 08), Data-Engineering-Planning
EChetty moved T309769: Expanding External Referrer Tracking from Sprint 05-06 to Sprint 07 on the Data Pipelines board.
Jan 16 2023, 4:07 PM · Data Pipelines (Sprint 08), Metrics Platform Backlog, Foundational Technology Requests
EChetty moved T324995: Include EU Registered Country in the canonical country database from Sprint 05-06 to Sprint 07 on the Data Pipelines board.
Jan 16 2023, 4:06 PM · Product-Analytics (Kanban), Data Pipelines (Sprint 07), Data-Engineering-Planning
EChetty created T327083: End to end test of the Java MPC.
Jan 16 2023, 3:25 PM · Metrics Platform Backlog (Metrics Platform Kanban)
EChetty moved T281773: Publish librarized Metrics Platform Java client to Maven from Ready to Next Up on the Metrics Platform Backlog (Metrics Platform Kanban) board.
Jan 16 2023, 3:23 PM · Metrics Platform Backlog (Metrics Platform Kanban), Product-Data-Infrastructure