Page MenuHomePhabricator

Antoine_Quhen (aqu)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Jan 4 2022, 1:16 PM (89 w, 3 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
AQuhen (WMF) [ Global Accounts ]

Recent Activity

Mon, Sep 18

Antoine_Quhen moved T335862: Implement job to generate Dump XML files from In progress to In Review on the Data Engineering and Event Platform Team (Sprint 2) board.
Mon, Sep 18, 4:09 PM · Data Products (Sprint 01), Data Engineering and Event Platform Team (Sprint 2), Patch-For-Review, Data Pipelines (Sprint 14)

Mon, Sep 11

Antoine_Quhen added a comment to T346084: eventutilities-python: Gitlab CI pipeline should use memory optimized runners..

For Airflow dags, we are using trusted-runners provided by rel-eng.

Mon, Sep 11, 7:54 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform
Antoine_Quhen added a comment to T346085: [BUG] eventutilites-python: fix type checking CI job.
Mon, Sep 11, 7:45 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform

Thu, Sep 7

Antoine_Quhen added a comment to T335862: Implement job to generate Dump XML files.

I have done part of the refactor in this change: https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/938941/2..3
including:

  • Adding the unit tests on the important part of the code
  • Add some new traits, and classes and rename some classes for better comprehension
Thu, Sep 7, 4:25 PM · Data Products (Sprint 01), Data Engineering and Event Platform Team (Sprint 2), Patch-For-Review, Data Pipelines (Sprint 14)

Wed, Sep 6

Antoine_Quhen updated subscribers of T335862: Implement job to generate Dump XML files.

What has been done in a first step:

  • Custom partitioner POC
  • First implementation
  • Clarifying source & result expectation
Wed, Sep 6, 8:50 AM · Data Products (Sprint 01), Data Engineering and Event Platform Team (Sprint 2), Patch-For-Review, Data Pipelines (Sprint 14)

Mon, Sep 4

Antoine_Quhen added a comment to T343232: Configure Airflow to send metrics to Prometheus.

1 metric that could have been useful was the number of task retries.

Mon, Sep 4, 11:06 AM · Observability-Metrics, Data-Platform-SRE, Data-Engineering

Wed, Aug 30

Antoine_Quhen created T345232: Allow retry on Airflow druid_load_webrequest_sampled_128_daily.remove_temporary_directory.
Wed, Aug 30, 8:50 AM · Data Engineering and Event Platform Team

Aug 17 2023

Antoine_Quhen added a comment to T335862: Implement job to generate Dump XML files.

OK to move to Gitlab. 👍 I'm making it work first.

Aug 17 2023, 9:42 AM · Data Products (Sprint 01), Data Engineering and Event Platform Team (Sprint 2), Patch-For-Review, Data Pipelines (Sprint 14)

Jul 17 2023

Antoine_Quhen added a comment to T335862: Implement job to generate Dump XML files.

I have the first draft version in Gerrit.

Jul 17 2023, 9:51 PM · Data Products (Sprint 01), Data Engineering and Event Platform Team (Sprint 2), Patch-For-Review, Data Pipelines (Sprint 14)

Jul 8 2023

Antoine_Quhen closed T304889: Airflow CI/CD Documentation, a subtask of T295199: [Airflow] User manual and documentation, as Resolved.
Jul 8 2023, 2:27 PM · Data Engineering and Event Platform Team, Data-Engineering-Kanban, Epic, Data Pipelines, Data-Engineering
Antoine_Quhen closed T304889: Airflow CI/CD Documentation as Resolved.

Covered here: https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/Airflow/Developer_guide

Jul 8 2023, 2:26 PM · Data Engineering and Event Platform Team, Documentation, Data Pipelines

Jul 6 2023

Antoine_Quhen added a comment to T340673: Update Airflow Documentation.

Maybe add here: https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/Airflow/Developer_guide#Conventions_&_good_practices

Jul 6 2023, 2:56 PM · Data Engineering and Event Platform Team, Data Pipelines

Jun 29 2023

Antoine_Quhen moved T335862: Implement job to generate Dump XML files from Next Up to In Progress on the Data Pipelines (Sprint 14) board.
Jun 29 2023, 4:05 PM · Data Products (Sprint 01), Data Engineering and Event Platform Team (Sprint 2), Patch-For-Review, Data Pipelines (Sprint 14)
Antoine_Quhen claimed T335862: Implement job to generate Dump XML files.
Jun 29 2023, 4:05 PM · Data Products (Sprint 01), Data Engineering and Event Platform Team (Sprint 2), Patch-For-Review, Data Pipelines (Sprint 14)

Jun 27 2023

Antoine_Quhen closed T338033: Wikistats Bug: Small countries not displayed on the map as Resolved.

3 of our dataset are now going to use canonical.countries.is_protected:

  • Cassandra AQS pageview_top_percountry_daily
  • Cassandra AQS pageview_top_bycountry_monthly
  • Hive geoeditors_public_monthly
Jun 27 2023, 5:00 PM · Patch-For-Review, Data Pipelines (Sprint 14), Data-Engineering, Data-Engineering-Wikistats
Antoine_Quhen added a comment to T318346: Add Python Linter Checks to CI.

Documentation added here: https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/Airflow/Developer_guide

Jun 27 2023, 3:59 PM · Patch-For-Review, Data Pipelines (Sprint 14), Data-Engineering-Planning
Antoine_Quhen updated subscribers of T339805: Add cswiki to clickstream.

Hello @lbowmaker , for this ticket, a patch has already been proposed.

Jun 27 2023, 7:55 AM · Data Engineering and Event Platform Team, Privacy Engineering, Data Pipelines

Jun 26 2023

Antoine_Quhen moved T338033: Wikistats Bug: Small countries not displayed on the map from In Review to Ready to Deploy on the Data Pipelines (Sprint 14) board.
Jun 26 2023, 4:07 PM · Patch-For-Review, Data Pipelines (Sprint 14), Data-Engineering, Data-Engineering-Wikistats

Jun 21 2023

Antoine_Quhen moved T333004: Setup config to allow lineage instrumentation from Ready to Deploy to Done on the Data Pipelines (Sprint 14) board.
Jun 21 2023, 3:59 PM · Data Pipelines (Sprint 14), Data-Engineering-Planning

Jun 20 2023

Antoine_Quhen updated the task description for T339928: Canonical-data ownership, definition and update.
Jun 20 2023, 3:46 PM · Movement-Insights, Data-Engineering
Antoine_Quhen created T339928: Canonical-data ownership, definition and update.
Jun 20 2023, 2:13 PM · Movement-Insights, Data-Engineering

Jun 19 2023

Antoine_Quhen moved T338033: Wikistats Bug: Small countries not displayed on the map from In Progress to In Review on the Data Pipelines (Sprint 14) board.
Jun 19 2023, 3:58 PM · Patch-For-Review, Data Pipelines (Sprint 14), Data-Engineering, Data-Engineering-Wikistats

Jun 13 2023

Antoine_Quhen updated subscribers of T338033: Wikistats Bug: Small countries not displayed on the map.

I'm proposing with those patches:

Jun 13 2023, 3:29 PM · Patch-For-Review, Data Pipelines (Sprint 14), Data-Engineering, Data-Engineering-Wikistats

Jun 8 2023

Antoine_Quhen moved T338033: Wikistats Bug: Small countries not displayed on the map from Next Up to In Progress on the Data Pipelines (Sprint 14) board.
Jun 8 2023, 3:58 PM · Patch-For-Review, Data Pipelines (Sprint 14), Data-Engineering, Data-Engineering-Wikistats
Antoine_Quhen claimed T338033: Wikistats Bug: Small countries not displayed on the map.
Jun 8 2023, 3:58 PM · Patch-For-Review, Data Pipelines (Sprint 14), Data-Engineering, Data-Engineering-Wikistats
Antoine_Quhen moved T333004: Setup config to allow lineage instrumentation from In Progress to In Review on the Data Pipelines (Sprint 14) board.
Jun 8 2023, 3:54 PM · Data Pipelines (Sprint 14), Data-Engineering-Planning

Jun 7 2023

Antoine_Quhen added a comment to T326570: Migrate custom gitlab runner that runs Dockerfiles to releng's new production infra.

Should we merge those CI pipeline changes to make it the standard in workflow utils?

We have three new pipelines in this MR:

  • Build image
  • Run tests
  • Build & publish artifacts into the registry
Jun 7 2023, 3:15 PM · Data Engineering and Event Platform Team, Data Pipelines
Antoine_Quhen moved T333004: Setup config to allow lineage instrumentation from Blocked/Paused to In Progress on the Data Pipelines (Sprint 14) board.
Jun 7 2023, 1:56 PM · Data Pipelines (Sprint 14), Data-Engineering-Planning
Antoine_Quhen moved T318346: Add Python Linter Checks to CI from In Progress to In Review on the Data Pipelines (Sprint 14) board.
Jun 7 2023, 1:56 PM · Patch-For-Review, Data Pipelines (Sprint 14), Data-Engineering-Planning
Antoine_Quhen moved T318346: Add Python Linter Checks to CI from In Review to In Progress on the Data Pipelines (Sprint 14) board.
Jun 7 2023, 1:56 PM · Patch-For-Review, Data Pipelines (Sprint 14), Data-Engineering-Planning
Antoine_Quhen added a comment to T318346: Add Python Linter Checks to CI.

I've a MR with:

Jun 7 2023, 8:58 AM · Patch-For-Review, Data Pipelines (Sprint 14), Data-Engineering-Planning
Antoine_Quhen added a comment to T326570: Migrate custom gitlab runner that runs Dockerfiles to releng's new production infra.

https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/417

Jun 7 2023, 8:54 AM · Data Engineering and Event Platform Team, Data Pipelines

Jun 5 2023

Antoine_Quhen added a comment to T336744: Harmonize tags across Airflow dags.

Today we decided not to automatize the tagging process.

Jun 5 2023, 4:39 PM · Data Engineering and Event Platform Team (Sprint 2), Data Products (Sprint 00), Data Pipelines (Sprint 14)
Antoine_Quhen moved T318346: Add Python Linter Checks to CI from In Progress to In Review on the Data Pipelines (Sprint 14) board.
Jun 5 2023, 4:01 PM · Patch-For-Review, Data Pipelines (Sprint 14), Data-Engineering-Planning

May 30 2023

Antoine_Quhen moved T318346: Add Python Linter Checks to CI from Next Up to In Progress on the Data Pipelines (Sprint 14) board.
May 30 2023, 4:08 PM · Patch-For-Review, Data Pipelines (Sprint 14), Data-Engineering-Planning

May 29 2023

Antoine_Quhen added a comment to T326570: Migrate custom gitlab runner that runs Dockerfiles to releng's new production infra.

https://gitlab.wikimedia.org/repos/releng/gitlab-trusted-runner/-/merge_requests/27/commits
https://gitlab.wikimedia.org/repos/releng/gitlab-trusted-runner/-/merge_requests/26/commits

May 29 2023, 3:26 PM · Data Engineering and Event Platform Team, Data Pipelines
Antoine_Quhen updated the task description for T326570: Migrate custom gitlab runner that runs Dockerfiles to releng's new production infra.
May 29 2023, 3:07 PM · Data Engineering and Event Platform Team, Data Pipelines
Antoine_Quhen updated the task description for T326570: Migrate custom gitlab runner that runs Dockerfiles to releng's new production infra.
May 29 2023, 3:07 PM · Data Engineering and Event Platform Team, Data Pipelines
Antoine_Quhen claimed T318346: Add Python Linter Checks to CI.
May 29 2023, 6:13 AM · Patch-For-Review, Data Pipelines (Sprint 14), Data-Engineering-Planning

May 25 2023

Antoine_Quhen renamed T336718: Write data to Iceberg formatted tables (mediawiki.page_content_change) from Write data to Iceberg formatted tables to Write data to Iceberg formatted tables (mediawiki.page_content_change).
May 25 2023, 4:06 PM · Data Engineering and Event Platform Team (Sprint 1), Data Pipelines (Sprint 14)
Antoine_Quhen moved T336718: Write data to Iceberg formatted tables (mediawiki.page_content_change) from In Progress to Next Up on the Data Pipelines (Sprint 14) board.
May 25 2023, 3:29 PM · Data Engineering and Event Platform Team (Sprint 1), Data Pipelines (Sprint 14)
Antoine_Quhen placed T336718: Write data to Iceberg formatted tables (mediawiki.page_content_change) up for grabs.
May 25 2023, 3:28 PM · Data Engineering and Event Platform Team (Sprint 1), Data Pipelines (Sprint 14)
Antoine_Quhen claimed T336718: Write data to Iceberg formatted tables (mediawiki.page_content_change).
May 25 2023, 3:28 PM · Data Engineering and Event Platform Team (Sprint 1), Data Pipelines (Sprint 14)
Antoine_Quhen closed T335917: Update Sqoop for externallinks table changes, a subtask of T312666: Remove duplication in externallinks table, as Resolved.
May 25 2023, 11:08 AM · MW-1.41-notes (1.41.0-wmf.26; 2023-09-12), Patch-For-Review, MediaWiki-Page-derived-data, DBA
Antoine_Quhen closed T335917: Update Sqoop for externallinks table changes as Resolved.
May 25 2023, 11:08 AM · Data Pipelines (Sprint 14), Data-Engineering
Antoine_Quhen moved T335917: Update Sqoop for externallinks table changes from Ready to Deploy to Done on the Data Pipelines (Sprint 14) board.
May 25 2023, 11:07 AM · Data Pipelines (Sprint 14), Data-Engineering
Antoine_Quhen moved T336798: Fix druid_load_pageviews_daily_aggregated_monthly from Ready to Deploy to Done on the Data Pipelines (Sprint 14) board.
May 25 2023, 11:07 AM · Data Pipelines (Sprint 14)

May 24 2023

Antoine_Quhen moved T335917: Update Sqoop for externallinks table changes from In Review to Ready to Deploy on the Data Pipelines (Sprint 14) board.
May 24 2023, 4:03 PM · Data Pipelines (Sprint 14), Data-Engineering
Antoine_Quhen added a comment to T330236: Event partitions missing since 2023-02-21T10:00 for stream without events (canary events not produced?).

There is a second problem hidden behind the missing Scala lib: the Guava version mismatch between the one provided by Hadoop and the one included in eventutilities.

May 24 2023, 2:55 PM · Event-Platform (Sprint 14 B), Data Pipelines (Sprint 14), Data-Engineering-Planning
Antoine_Quhen added a comment to T335917: Update Sqoop for externallinks table changes.

Squooping test is conclusive and the patch could be merged right now.

May 24 2023, 1:53 PM · Data Pipelines (Sprint 14), Data-Engineering
Antoine_Quhen created P48501 Sqooping test following schema update.
May 24 2023, 1:52 PM · Data Pipelines
Antoine_Quhen added a comment to T330236: Event partitions missing since 2023-02-21T10:00 for stream without events (canary events not produced?).

Refinery-source does not ship Scala anymore because it was included in wikihadoop, which is not included anymore.
https://archiva.wikimedia.org/#artifact-dependencies/org.wikimedia/wikihadoop/0.3-wmf1

May 24 2023, 9:01 AM · Event-Platform (Sprint 14 B), Data Pipelines (Sprint 14), Data-Engineering-Planning

May 23 2023

Antoine_Quhen added a project to T330236: Event partitions missing since 2023-02-21T10:00 for stream without events (canary events not produced?): Data Pipelines.
May 23 2023, 4:36 PM · Event-Platform (Sprint 14 B), Data Pipelines (Sprint 14), Data-Engineering-Planning
Antoine_Quhen moved T333004: Setup config to allow lineage instrumentation from In Review to Blocked/Paused on the Data Pipelines (Sprint 14) board.

Thanks all for the reviews. Even if the DAG is working, deciding the single source of truth for our dataset metadata could be great. Right now, its located in:

  • airflow-dags/../dataset.yml
  • airflow-dags/../..._dag.py
  • DataHub
May 23 2023, 4:00 PM · Data Pipelines (Sprint 14), Data-Engineering-Planning

May 22 2023

Antoine_Quhen changed the status of T335917: Update Sqoop for externallinks table changes from Open to In Progress.
May 22 2023, 9:07 PM · Data Pipelines (Sprint 14), Data-Engineering
Antoine_Quhen changed the status of T335917: Update Sqoop for externallinks table changes, a subtask of T312666: Remove duplication in externallinks table, from Open to In Progress.
May 22 2023, 9:07 PM · MW-1.41-notes (1.41.0-wmf.26; 2023-09-12), Patch-For-Review, MediaWiki-Page-derived-data, DBA
Antoine_Quhen created P48460 Update mediawiki.externallinks migration script.
May 22 2023, 9:03 PM · Data-Engineering
Antoine_Quhen added a comment to T335917: Update Sqoop for externallinks table changes.

Do you know if there is a DB with the new schema version? It would be cool to have a place to test the import.

May 22 2023, 8:57 PM · Data Pipelines (Sprint 14), Data-Engineering
Antoine_Quhen closed T325266: Replace refinery-source Guava caches by Caffeine as Resolved.
May 22 2023, 12:44 PM · Event-Platform, Data-Engineering-Planning
Antoine_Quhen closed T325266: Replace refinery-source Guava caches by Caffeine, a subtask of T327072: Java Prep for Webrequest Load, as Resolved.
May 22 2023, 12:44 PM · Patch-For-Review, Data Pipelines (sprint 10)

May 16 2023

Antoine_Quhen claimed T335917: Update Sqoop for externallinks table changes.
May 16 2023, 10:44 AM · Data Pipelines (Sprint 14), Data-Engineering
Antoine_Quhen closed T326193: Airflow upgrade (refactor deb creation + version bump + switch to PostgreSQL) as Resolved.
May 16 2023, 10:42 AM · Data Pipelines
Antoine_Quhen added a parent task for T336745: Split Cassandra Airflow dags by dataset: T336739: Post Oozie -> Airflow migration refactorings.
May 16 2023, 10:17 AM · Data Engineering and Event Platform Team (Sprint 0), Data Pipelines (Sprint 14)
Antoine_Quhen added a subtask for T336739: Post Oozie -> Airflow migration refactorings: T336745: Split Cassandra Airflow dags by dataset.
May 16 2023, 10:17 AM · Epic, Data Engineering and Event Platform Team, Data Pipelines
Antoine_Quhen created T336745: Split Cassandra Airflow dags by dataset.
May 16 2023, 10:17 AM · Data Engineering and Event Platform Team (Sprint 0), Data Pipelines (Sprint 14)
Antoine_Quhen added a parent task for T336744: Harmonize tags across Airflow dags: T336739: Post Oozie -> Airflow migration refactorings.
May 16 2023, 10:06 AM · Data Engineering and Event Platform Team (Sprint 2), Data Products (Sprint 00), Data Pipelines (Sprint 14)
Antoine_Quhen added a subtask for T336739: Post Oozie -> Airflow migration refactorings: T336744: Harmonize tags across Airflow dags.
May 16 2023, 10:06 AM · Epic, Data Engineering and Event Platform Team, Data Pipelines
Antoine_Quhen created T336744: Harmonize tags across Airflow dags.
May 16 2023, 10:06 AM · Data Engineering and Event Platform Team (Sprint 2), Data Products (Sprint 00), Data Pipelines (Sprint 14)
Antoine_Quhen moved T336739: Post Oozie -> Airflow migration refactorings from Backlog to Epics on the Data Pipelines board.
May 16 2023, 9:58 AM · Epic, Data Engineering and Event Platform Team, Data Pipelines
Antoine_Quhen added a parent task for T336741: Make sure all partitions sensors are using the Dataset helpers: T336739: Post Oozie -> Airflow migration refactorings.
May 16 2023, 9:58 AM · Data Engineering and Event Platform Team, Data Pipelines
Antoine_Quhen added a subtask for T336739: Post Oozie -> Airflow migration refactorings: T336741: Make sure all partitions sensors are using the Dataset helpers.
May 16 2023, 9:58 AM · Epic, Data Engineering and Event Platform Team, Data Pipelines
Antoine_Quhen created T336741: Make sure all partitions sensors are using the Dataset helpers.
May 16 2023, 9:57 AM · Data Engineering and Event Platform Team, Data Pipelines
Antoine_Quhen added a parent task for T336738: Refactor our existing Airflow dags to use EasyDAG & DagProperties: T336739: Post Oozie -> Airflow migration refactorings.
May 16 2023, 9:52 AM · Data Engineering and Event Platform Team, Data Pipelines
Antoine_Quhen added a subtask for T336739: Post Oozie -> Airflow migration refactorings: T336738: Refactor our existing Airflow dags to use EasyDAG & DagProperties.
May 16 2023, 9:52 AM · Epic, Data Engineering and Event Platform Team, Data Pipelines
Antoine_Quhen created T336739: Post Oozie -> Airflow migration refactorings.
May 16 2023, 9:51 AM · Epic, Data Engineering and Event Platform Team, Data Pipelines
Antoine_Quhen updated the task description for T336738: Refactor our existing Airflow dags to use EasyDAG & DagProperties.
May 16 2023, 9:48 AM · Data Engineering and Event Platform Team, Data Pipelines
Antoine_Quhen created T336738: Refactor our existing Airflow dags to use EasyDAG & DagProperties.
May 16 2023, 9:47 AM · Data Engineering and Event Platform Team, Data Pipelines

May 15 2023

Antoine_Quhen added a comment to T333004: Setup config to allow lineage instrumentation.

Here is a standardized version of the first iteration for easy use by ppl without knowledge of DataHub: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/386

May 15 2023, 2:15 PM · Data Pipelines (Sprint 14), Data-Engineering-Planning

May 11 2023

Antoine_Quhen added a comment to T333004: Setup config to allow lineage instrumentation.

Some propositions for an immediate and more useful next step:

May 11 2023, 3:30 PM · Data Pipelines (Sprint 14), Data-Engineering-Planning
Antoine_Quhen moved T333004: Setup config to allow lineage instrumentation from In Progress to In Review on the Data Pipelines (Sprint 12) board.
May 11 2023, 1:50 PM · Data Pipelines (Sprint 14), Data-Engineering-Planning
Antoine_Quhen added a comment to T333004: Setup config to allow lineage instrumentation.

https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/382

May 11 2023, 1:45 PM · Data Pipelines (Sprint 14), Data-Engineering-Planning

May 10 2023

Antoine_Quhen added a comment to T333004: Setup config to allow lineage instrumentation.

Update: I'm emitting metadata to Kafka from an ad-hoc Airflow data lineage task. The configuration is setting up the communication with Kafka and the schema registry, Karapace. Then the metadata is well-fetched by the mce-consumer service on the DataHub side. Now I'm looking to use the detailed version of the data lineage event, containing more information than just the link upstream<>downstream.

May 10 2023, 8:55 PM · Data Pipelines (Sprint 14), Data-Engineering-Planning

May 4 2023

Antoine_Quhen moved T334101: [Airflow] Migrate mediawiki geoeditors druid loading job from In Review to Ready to Deploy on the Data Pipelines (Sprint 12) board.
May 4 2023, 12:57 PM · Patch-For-Review, Data Pipelines (Sprint 12)

May 3 2023

Antoine_Quhen closed T332707: Auto clean /wmf/data/raw/webrequests_data_loss as Resolved.

I've checked the result on HDFS. It performs as expected.

May 3 2023, 4:50 PM · Data Pipelines (Sprint 12)

May 2 2023

Antoine_Quhen moved T333004: Setup config to allow lineage instrumentation from Next Up to In Progress on the Data Pipelines (Sprint 12) board.
May 2 2023, 2:01 PM · Data Pipelines (Sprint 14), Data-Engineering-Planning
Antoine_Quhen claimed T333004: Setup config to allow lineage instrumentation.
May 2 2023, 2:00 PM · Data Pipelines (Sprint 14), Data-Engineering-Planning

Apr 24 2023

Antoine_Quhen moved T334101: [Airflow] Migrate mediawiki geoeditors druid loading job from Next Up to In Progress on the Data Pipelines (Sprint 12) board.
Apr 24 2023, 4:04 PM · Patch-For-Review, Data Pipelines (Sprint 12)
Antoine_Quhen claimed T334101: [Airflow] Migrate mediawiki geoeditors druid loading job.
Apr 24 2023, 9:19 AM · Patch-For-Review, Data Pipelines (Sprint 12)
Antoine_Quhen moved T332707: Auto clean /wmf/data/raw/webrequests_data_loss from In Review to Ready to Deploy on the Data Pipelines (Sprint 11) board.
Apr 24 2023, 9:11 AM · Data Pipelines (Sprint 12)

Apr 14 2023

Antoine_Quhen moved T332707: Auto clean /wmf/data/raw/webrequests_data_loss from In Progress to In Review on the Data Pipelines (Sprint 11) board.
Apr 14 2023, 8:20 AM · Data Pipelines (Sprint 12)

Apr 13 2023

Antoine_Quhen updated the task description for T332707: Auto clean /wmf/data/raw/webrequests_data_loss.
Apr 13 2023, 3:19 PM · Data Pipelines (Sprint 12)
Antoine_Quhen created T334678: webrequest / webrequest raw quality check .
Apr 13 2023, 3:19 PM · Data Engineering and Event Platform Team, Data Pipelines
Antoine_Quhen added a comment to T332707: Auto clean /wmf/data/raw/webrequests_data_loss.

OK to separate the migration from this task.

Apr 13 2023, 2:27 PM · Data Pipelines (Sprint 12)
Antoine_Quhen added a comment to T327073: Write Airflow DAG to move the webrequest load job to airflow..

Bug: There is an extra systemd check making sure SUCCESS files are generated:
https://github.com/wikimedia/operations-puppet/blob/fc98a524be9be65935b8d80b506ca33af5d442b2/modules/profile/manifests/analytics/refinery/job/data_check.pp#L27

Apr 13 2023, 2:26 PM · Data Pipelines (Sprint 11), Patch-For-Review
Antoine_Quhen updated the task description for T332707: Auto clean /wmf/data/raw/webrequests_data_loss.
Apr 13 2023, 10:00 AM · Data Pipelines (Sprint 12)

Apr 12 2023

Antoine_Quhen moved T327073: Write Airflow DAG to move the webrequest load job to airflow. from Ready to Deploy to Done on the Data Pipelines (Sprint 11) board.
Apr 12 2023, 10:36 AM · Data Pipelines (Sprint 11), Patch-For-Review

Apr 11 2023

Antoine_Quhen added a comment to T333001: Setup for allowing Airflow deployment via Git Repository.

I like idea A because the conda env encapsulates all needed libs.

Apr 11 2023, 5:24 PM · Data Pipelines (Sprint 12)
Antoine_Quhen updated the task description for T334493: analytics/refinery deployment broken at refinery-deploy-to-hdfs.
Apr 11 2023, 4:55 PM · Data-Platform-SRE
Antoine_Quhen created T334493: analytics/refinery deployment broken at refinery-deploy-to-hdfs.
Apr 11 2023, 4:54 PM · Data-Platform-SRE
Antoine_Quhen moved T332707: Auto clean /wmf/data/raw/webrequests_data_loss from Next Up to In Progress on the Data Pipelines (Sprint 11) board.
Apr 11 2023, 8:55 AM · Data Pipelines (Sprint 12)