Page MenuHomePhabricator

amastilovic (Aleksandar Mastilovic)
User

Today

  • No visible events.

Tomorrow

  • No visible events.

Tuesday

  • No visible events.

User Details

User Since
Jan 20 2024, 12:05 AM (98 w, 1 d)
Availability
Available
IRC Nick
amastilovic
LDAP User
Aleksandar Mastilovic
MediaWiki User
AMastilovic-WMF [ Global Accounts ]

Recent Activity

Tue, Dec 2

amastilovic created T411536: Set a custom From: email address for alerts from Airflow dev instances.
Tue, Dec 2, 6:08 PM · Patch-For-Review, Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Mon, Dec 1

amastilovic moved T400283: Clean up airflow-dags gitlab-ci.yaml CI/CD pipelines from In progress to Done on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Mon, Dec 1, 4:54 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
amastilovic moved T410285: SDS 1.3.6 SPUR bot detection - Productionize SPUR datasets import from In progress to In Review on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Mon, Dec 1, 4:44 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
amastilovic moved T409601: Review and productionize the WME differential privacy data set from Next Up to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Mon, Dec 1, 4:32 PM · Patch-For-Review, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
amastilovic moved T410688: Implement a new pipeline and table with reconciled historical revision data from Urgent to In Review on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Mon, Dec 1, 4:25 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Tue, Nov 25

amastilovic closed T409782: Update thresholds configuration for MediaWiki History Reduced error checks as Resolved.
Tue, Nov 25, 9:55 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
amastilovic added a comment to T377023: Add CI step to event schema repositories to test to fail if a schema is deleted.

Ditto what @xcollazo said above. In order to have the desired behavior for this pipeline job, I think you need:

Tue, Nov 25, 2:05 AM · Patch-For-Review, Data-Engineering (Q2 FY25/26 October 1st - December 31th), Event-Platform
amastilovic created T410972: Requesting access to cassandra-staging-devs group for amastilovic.
Tue, Nov 25, 1:04 AM · SRE, SRE-Access-Requests

Mon, Nov 24

amastilovic added a comment to T410962: Provision Global Editor Metrics tables & endpoints.

@Eevans thank you for that MR! You are correct, wiki_id should be TEXT - we've already implemented it in the Hive counterpart for that table: https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1206879

Mon, Nov 24, 10:50 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence

Tue, Nov 18

amastilovic added a comment to T409782: Update thresholds configuration for MediaWiki History Reduced error checks.

Addressed in https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1794

Tue, Nov 18, 1:58 AM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Fri, Nov 14

amastilovic updated subscribers of T405039: Global Editor Metrics - Data Pipeline.

OK so I've now officially backfilled the wmf_contributors. and wmf_readership. tables, but the process I had to use in order for the number of files to be small enough is complicated enough that it warrants being documented somewhere:

Fri, Nov 14, 7:37 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data

Mon, Nov 10

amastilovic created T409782: Update thresholds configuration for MediaWiki History Reduced error checks.
Mon, Nov 10, 8:44 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Fri, Nov 7

amastilovic updated the task description for T409514: Migrate Sqoop jobs to Airflow.
Fri, Nov 7, 3:02 AM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
amastilovic created T409514: Migrate Sqoop jobs to Airflow.
Fri, Nov 7, 1:04 AM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Oct 31 2025

amastilovic added a comment to T408942: Add code styles rules to analytics-refinery-source.

We definitely already have the maven-checkstyle-plugin set up in the main pom.xml - I know because it's very annoying since the codebase doesn't seem to conform to the style being checked, and on each compile it produces a ton of ERRORs in output.

Oct 31 2025, 9:39 PM · Data-Engineering, Essential-Work

Oct 30 2025

amastilovic added a comment to T408687: Create example dbt models using Iceberg.

That specific use-case sounds like what dbt calls a microbatch incremental strategy that replaces time intervals given the event_time column: https://docs.getdbt.com/docs/build/incremental-microbatch

Oct 30 2025, 8:18 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Essential-Work, Movement-Insights, Data-Engineering (Q2 FY25/26 October 1st - December 31th), Epic
amastilovic added a comment to T407322: Create dbt folder structure.

@JMonton-WMF I think we could use this task to include a .dbtignore file that will let dbt commands ignore the .ipynb_checkpoint folders: https://docs.getdbt.com/reference/dbtignore

Oct 30 2025, 5:06 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
amastilovic added a comment to T408687: Create example dbt models using Iceberg.

insert_overwrite is what @JAllemandou is describing, perfect.

Oct 30 2025, 4:46 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Essential-Work, Movement-Insights, Data-Engineering (Q2 FY25/26 October 1st - December 31th), Epic

Oct 29 2025

amastilovic added a comment to T407994: Move Druid realtime configuration out of Refinery into standalone repo on GitLab.

Do we want only Druid realtime configs its own repo? Perhaps we want the batch ones in the same place?

My uninformed thought on this is that this should be a "Druid config stuff" repo, which would therefore include both realtime AND batch configs :)

Oct 29 2025, 11:42 PM · SRE, Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Oct 28 2025

amastilovic added a comment to T406263: mediawiki_history_reduced - add page_id and user_central_id fields.

user_id, user_central_id and page_id fields are now available in both the Hive dataset wmf.mediawiki_history_reduced as well as in the corresponding Druid dataset.

Oct 28 2025, 8:26 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data
amastilovic updated the task description for T406263: mediawiki_history_reduced - add page_id and user_central_id fields.
Oct 28 2025, 8:25 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data
amastilovic updated the task description for T406263: mediawiki_history_reduced - add page_id and user_central_id fields.
Oct 28 2025, 8:20 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data

Oct 22 2025

amastilovic claimed T365648: Add user_central_id to mediawiki_history and mediawiki_history_reduced Hive tables.
Oct 22 2025, 11:55 PM · Data-Engineering, Data Pipelines
amastilovic added a comment to T406766: Add dbt related packages to conda-analytics.

Yes, that's true. But conda-analytics isn't necessarily a long-term solution. I'd be much happier to start out with a container based solution as per: T406636: Create a dbt Docker container but container runtimes are not available to us on the stat servers at the moment.

At least this way, we will have something unform to work with already on the stat servers.

Oct 22 2025, 11:33 PM · OKR-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), Data-Engineering (Q2 FY25/26 October 1st - December 31th)
amastilovic updated the task description for T407994: Move Druid realtime configuration out of Refinery into standalone repo on GitLab.
Oct 22 2025, 7:09 PM · SRE, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
amastilovic created T407994: Move Druid realtime configuration out of Refinery into standalone repo on GitLab.
Oct 22 2025, 3:39 PM · SRE, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
amastilovic added a comment to T406069: Global Editor Metrics - Druid mediawiki_history_reduced changes.

@Ottomata so it sounds like we are ready to accept the mediawiki_history_reduced dataset as it is right now, but with user_central_id and page_id columns added? If so, I'll start backfilling September 2025.

Oct 22 2025, 1:57 AM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data

Oct 21 2025

amastilovic added a comment to T405039: Global Editor Metrics - Data Pipeline.

WIP Typo! I do that when i'm creating test tables so I can increment as I make changes. Final one will not have 0, but maybe _v1 if we want to version it!

Oct 21 2025, 11:24 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data
amastilovic added a comment to T406766: Add dbt related packages to conda-analytics.

@BTullis wouldn't this approach introduce a discrepancy between what users use on stat boxes and what is run in GitLab CI/CD and eventually in Airflow? The latter two will run in Docker images, and I wonder how different the two installations will end up being.

Oct 21 2025, 11:22 PM · OKR-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Oct 17 2025

amastilovic added a comment to T405039: Global Editor Metrics - Data Pipeline.

@Ottomata Regarding the naming of user_edited_pages as described in T407559 - why do we need the zero at the end of the table name? user_edited_pages_daily0 looks unnecessary to me, but I could be convinced otherwise.

Oct 17 2025, 4:44 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data

Oct 14 2025

amastilovic added a comment to T407020: Kokkuri build failure on airflow-dags repo.

Oh TIL that we can use kokkuri as a GitLab component! Thumbs up.

Oct 14 2025, 3:21 PM · Patch-For-Review, Release-Engineering-Team

Oct 9 2025

amastilovic created T406874: airflow-dags shorten build/test time.
Oct 9 2025, 2:28 PM · Essential-Work, Data-Platform-SRE (2025.11.07 - 2025.11.28), Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Oct 8 2025

amastilovic added a comment to T406765: Create a new gitlab repository for use with dbt.

I've created https://gitlab.wikimedia.org/repos/data-engineering/dbt

Oct 8 2025, 10:06 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Oct 7 2025

amastilovic added a comment to T406392: failed to push docker-registry.discovery.wmnet/repos/data-engineering/airflow-dags:airflow-2.10.5-py3.11-2025-10-03-192132-3003d4328df66a0086a350fdd2ba1dbd80a235c5: unknown: blob upload invalid.

@amastilovic - Have you subsequently been able to successfully build / push the docker-registry.discovery.wmnet/repos/data-engineering/airflow-dags image?

Oct 7 2025, 8:36 PM · serviceops, GitLab (CI & Job Runners)
amastilovic created T406636: Create a dbt Docker container.
Oct 7 2025, 6:49 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
amastilovic created T406634: Set up a working, usable dbt installation on stat boxes.
Oct 7 2025, 6:32 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), OKR-Work, Data-Engineering (Q2 FY25/26 October 1st - December 31th)
amastilovic closed T401325: mediawiki_history - account for temp accounts in mediawiki_user_history_check_error as Resolved.
Oct 7 2025, 1:16 AM · Movement-Insights, Data-Engineering (Q1 FY25/26 July 1st - September 30th)

Sep 26 2025

amastilovic created T405780: Update Gobblin documentation.
Sep 26 2025, 5:50 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Sep 25 2025

amastilovic added a comment to T405360: Implement an Airflow operator for moving data from point A to B.

This all feels very achievable, but I wonder if we might be making things difficult for ourselves by trying to define one operator that can do it all, like a single Swiss Army knife.

Sep 25 2025, 7:24 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Wikimedia Enterprise - Content Integrity, Data-Engineering (Q2 FY25/26 October 1st - December 31th), Wikimedia Enterprise, Essential-Work

Sep 23 2025

amastilovic added a comment to T405379: Clean up artifacts.yaml.

This might be quite onerous on ops week duty and/or folks just trying to upgrade or deploy their job.

We have that manual forced cache warmup for precisely this scenario by the way.

Sep 23 2025, 8:01 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
amastilovic added a comment to T405360: Implement an Airflow operator for moving data from point A to B.

Would data sizes be any concern? One of our use cases is a weekly transfer of a few hundred gb spread across ~10k files in a nested directory structure.

Sep 23 2025, 3:50 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Wikimedia Enterprise - Content Integrity, Data-Engineering (Q2 FY25/26 October 1st - December 31th), Wikimedia Enterprise, Essential-Work
amastilovic added a comment to T405360: Implement an Airflow operator for moving data from point A to B.

Blunderbuss could easily do this for you, with minimal resource usage on the Airflow executor side :-)

Sep 23 2025, 2:42 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Wikimedia Enterprise - Content Integrity, Data-Engineering (Q2 FY25/26 October 1st - December 31th), Wikimedia Enterprise, Essential-Work

Sep 18 2025

amastilovic added a comment to T404735: Fix CommonsCategoryGraphBuilder to reflect latest changes to categorylinks table.

The 2025-08 backfill run of the DAG has completed successfully, and judging by data sizes on HDFS I'd say it falls in line with what we've seen in the previous months. @GFontenelle_WMF if you have some basic validation checks to run on this data, now would be a good time. Thank you!

Sep 18 2025, 2:36 AM · Data-Engineering (Q1 FY25/26 July 1st - September 30th)

Sep 17 2025

amastilovic added a comment to T404735: Fix CommonsCategoryGraphBuilder to reflect latest changes to categorylinks table.

The new SQL query is:

Sep 17 2025, 2:34 AM · Data-Engineering (Q1 FY25/26 July 1st - September 30th)

Sep 16 2025

amastilovic created T404735: Fix CommonsCategoryGraphBuilder to reflect latest changes to categorylinks table.
Sep 16 2025, 4:01 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th)

Sep 4 2025

amastilovic added a watcher for Data-Engineering: amastilovic.
Sep 4 2025, 7:50 PM

Aug 29 2025

amastilovic added a comment to T401325: mediawiki_history - account for temp accounts in mediawiki_user_history_check_error.

As far as I can see, all these checkers already use DeequColumnAnalysis so the basic plumbing seems to already be there. I'll have to investigate a bit deeper to see if and how it's being used right now, but hopefully the scope creep won't be too big. I'll keep you updated here.

Aug 29 2025, 4:10 PM · Movement-Insights, Data-Engineering (Q1 FY25/26 July 1st - September 30th)

Aug 28 2025

amastilovic added a comment to T401325: mediawiki_history - account for temp accounts in mediawiki_user_history_check_error.

It seems to me that this growth is organic and should be taken into account when doing the error checks, but the MediawikiHistoryChecker unfortunately doesn't provide for fine-tuning the error boundaries. The way it works right now is, it accepts the minimum and maximum boundaries for any kind of growth. These same two boundaries are then used for user growth, page growth, denormalized history growth and reduced history growth checks.

Aug 28 2025, 10:06 PM · Movement-Insights, Data-Engineering (Q1 FY25/26 July 1st - September 30th)
amastilovic closed T397923: Adapt Sqoop for categorylinks schema change, a subtask of T385890: Add support for read new for categorylinks migration, as Resolved.
Aug 28 2025, 7:04 PM · MW-1.45-notes (1.45.0-wmf.17; 2025-09-02), Patch-For-Review, MW-1.44-notes (1.44.0-wmf.18; 2025-02-25), DBA
amastilovic closed T397923: Adapt Sqoop for categorylinks schema change as Resolved.
Aug 28 2025, 7:03 PM · Patch-For-Review, Data-Engineering (Q1 FY25/26 July 1st - September 30th)

Aug 27 2025

amastilovic reopened T397923: Adapt Sqoop for categorylinks schema change, a subtask of T385890: Add support for read new for categorylinks migration, as In Progress.
Aug 27 2025, 5:32 PM · MW-1.45-notes (1.45.0-wmf.17; 2025-09-02), Patch-For-Review, MW-1.44-notes (1.44.0-wmf.18; 2025-02-25), DBA
amastilovic reopened T397923: Adapt Sqoop for categorylinks schema change as "In Progress".
Aug 27 2025, 5:31 PM · Patch-For-Review, Data-Engineering (Q1 FY25/26 July 1st - September 30th)

Aug 26 2025

amastilovic renamed T400283: Clean up airflow-dags gitlab-ci.yaml CI/CD pipelines from Streamline and clean up airflow-dags gitlab-ci.yaml CI/CD pipelines to Clean up airflow-dags gitlab-ci.yaml CI/CD pipelines.
Aug 26 2025, 5:32 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Aug 19 2025

amastilovic added a comment to T402323: Blunderbuss: Move Hadoop/HDFS XML configuration into Helm deployment chart.

Use the following example of how these values should be saved in the deployment-charts repository: https://github.com/wikimedia/operations-deployment-charts/blob/master/helmfile.d/dse-k8s-services/_airflow_common_/values-analytics-production.yaml

Aug 19 2025, 4:47 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Data-Engineering (Q2 FY25/26 October 1st - December 31th), Essential-Work
amastilovic created T402323: Blunderbuss: Move Hadoop/HDFS XML configuration into Helm deployment chart.
Aug 19 2025, 4:46 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Data-Engineering (Q2 FY25/26 October 1st - December 31th), Essential-Work
amastilovic closed T402204: airflow-dags: Move Dockerfile linter to standard linter image and stage as Resolved.
Aug 19 2025, 3:26 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th)
amastilovic moved T402204: airflow-dags: Move Dockerfile linter to standard linter image and stage from Next Up to Done on the Data-Engineering (Q1 FY25/26 July 1st - September 30th) board.
Aug 19 2025, 3:26 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th)
amastilovic created T402315: Investigate why Blunderbuss cache artifacts can have different file permissions.
Aug 19 2025, 3:10 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th)

Aug 18 2025

amastilovic added a comment to T402204: airflow-dags: Move Dockerfile linter to standard linter image and stage.

https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1609

Aug 18 2025, 10:00 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th)
amastilovic created T402204: airflow-dags: Move Dockerfile linter to standard linter image and stage.
Aug 18 2025, 6:17 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th)

Aug 15 2025

amastilovic added a comment to T401854: Add a YAML linter to airflow-dags GitLab CI/CD.

https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1604

Aug 15 2025, 5:43 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th)
amastilovic closed T401709: Turn dependency on past tasks off for Gobblin DAGs as Resolved.
Aug 15 2025, 5:43 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th)
amastilovic updated the task description for T370368: Gobblin-wmf Gitlab migration and maintenance.
Aug 15 2025, 12:48 AM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Essential-Work, Event-Platform
amastilovic updated the task description for T401974: Integrate Gobblin-wmf into SonarQube.
Aug 15 2025, 12:48 AM · Essential-Work, Test-Platform (dek du (Current Sprint))
amastilovic created T401974: Integrate Gobblin-wmf into SonarQube.
Aug 15 2025, 12:48 AM · Essential-Work, Test-Platform (dek du (Current Sprint))

Aug 13 2025

amastilovic created T401854: Add a YAML linter to airflow-dags GitLab CI/CD.
Aug 13 2025, 6:18 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th)
amastilovic added a comment to T401694: Make Java 11 available in our Debian Bookworm repos .

Sorry for the confusion! I've updated the task description to be more clear. The packages will be built with the jdk and jre; see the current jdk8 Debian repo component for an example of what will be available.

Aug 13 2025, 4:33 PM · Infrastructure-Foundations, Epic, Data-Platform-SRE

Aug 12 2025

amastilovic closed T397923: Adapt Sqoop for categorylinks schema change as Resolved.
Aug 12 2025, 9:04 PM · Patch-For-Review, Data-Engineering (Q1 FY25/26 July 1st - September 30th)
amastilovic closed T397923: Adapt Sqoop for categorylinks schema change, a subtask of T385890: Add support for read new for categorylinks migration, as Resolved.
Aug 12 2025, 9:04 PM · MW-1.45-notes (1.45.0-wmf.17; 2025-09-02), Patch-For-Review, MW-1.44-notes (1.44.0-wmf.18; 2025-02-25), DBA
amastilovic added a comment to T401694: Make Java 11 available in our Debian Bookworm repos .

Another vote for JRE version, too.

Aug 12 2025, 5:42 PM · Infrastructure-Foundations, Epic, Data-Platform-SRE
amastilovic changed the status of T401709: Turn dependency on past tasks off for Gobblin DAGs from Open to In Progress.
Aug 12 2025, 3:49 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th)
amastilovic created T401709: Turn dependency on past tasks off for Gobblin DAGs.
Aug 12 2025, 3:49 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th)

Aug 11 2025

amastilovic closed T399958: Inform Data-Engineering about removal of cuc_ip, cule_ip, and cupe_ip as Resolved.
Aug 11 2025, 11:15 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Trust and Safety Product Team, CheckUser
amastilovic closed T399958: Inform Data-Engineering about removal of cuc_ip, cule_ip, and cupe_ip, a subtask of T363487: Remove the cuc_ip, cule_ip, and cupe_ip columns from the cu_changes, cu_log_event, and cu_private_event tables respectively as duplicated to the IP hex columns, as Resolved.
Aug 11 2025, 11:15 PM · Product Safety and Integrity, Essential-Work, Schema-change, Data-Persistence (work done), CheckUser
amastilovic added a comment to T397923: Adapt Sqoop for categorylinks schema change.

I approved the MR above.
And actually while we're at it, we should review if the list here https://github.com/wikimedia/analytics-refinery/blob/master/bin/refinery-drop-mediawiki-snapshots#L85 matches the list of tables we sqoop!

Aug 11 2025, 2:46 PM · Patch-For-Review, Data-Engineering (Q1 FY25/26 July 1st - September 30th)

Aug 6 2025

amastilovic added a comment to T401325: mediawiki_history - account for temp accounts in mediawiki_user_history_check_error.

Hi @amastilovic, could you look into this? Priority is high, but it doesn't need to be fixed-fixed until the end of August, before the next monthly snapshot runs.

Thank you!

Aug 6 2025, 10:08 PM · Movement-Insights, Data-Engineering (Q1 FY25/26 July 1st - September 30th)
amastilovic closed T400411: Enable forced cache warmup option for airflow-dags blunderbuss integration as Resolved.
Aug 6 2025, 10:07 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th)

Aug 5 2025

amastilovic closed T401134: Write a little Artifact deployment paragraph in Airflow Wikitech docs as Resolved.
Aug 5 2025, 2:07 PM · Q1 FY2025/26 July 1st - September 30th

Aug 4 2025

amastilovic updated the task description for T401134: Write a little Artifact deployment paragraph in Airflow Wikitech docs.
Aug 4 2025, 10:02 PM · Q1 FY2025/26 July 1st - September 30th
amastilovic changed the status of T401134: Write a little Artifact deployment paragraph in Airflow Wikitech docs from Open to In Progress.
Aug 4 2025, 9:52 PM · Q1 FY2025/26 July 1st - September 30th
amastilovic added a comment to T400188: Increase the capacity of /var/cache/archiva on the appropriate archiva.wikimedia.org server(s).

Thank you, @BTullis !!

Aug 4 2025, 5:19 PM · Data-Platform-SRE (2025.07.26 - 2025.08.15)
amastilovic created T401134: Write a little Artifact deployment paragraph in Airflow Wikitech docs.
Aug 4 2025, 5:03 PM · Q1 FY2025/26 July 1st - September 30th
amastilovic moved T370368: Gobblin-wmf Gitlab migration and maintenance from Next Up to In progress on the Data-Engineering (Q1 FY25/26 July 1st - September 30th) board.
Aug 4 2025, 3:27 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Essential-Work, Event-Platform
amastilovic moved T384726: Implement full parity between HiveSensor and RESTExternalTaskSensor from Next Up to In progress on the Data-Engineering (Q1 FY25/26 July 1st - September 30th) board.
Aug 4 2025, 3:26 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
amastilovic moved T348958: Bump memory to enable large artifacts sync on HDFS from Next Up to In progress on the Data-Engineering (Q1 FY25/26 July 1st - September 30th) board.
Aug 4 2025, 3:24 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Structured-Data-Backlog
amastilovic moved T400283: Clean up airflow-dags gitlab-ci.yaml CI/CD pipelines from Next Up to In progress on the Data-Engineering (Q1 FY25/26 July 1st - September 30th) board.
Aug 4 2025, 3:24 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
amastilovic moved T400411: Enable forced cache warmup option for airflow-dags blunderbuss integration from Next Up to In Review on the Data-Engineering (Q1 FY25/26 July 1st - September 30th) board.
Aug 4 2025, 3:23 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th)
amastilovic moved T399958: Inform Data-Engineering about removal of cuc_ip, cule_ip, and cupe_ip from In progress to In Review on the Data-Engineering (Q1 FY25/26 July 1st - September 30th) board.
Aug 4 2025, 3:23 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Trust and Safety Product Team, CheckUser
amastilovic moved T397923: Adapt Sqoop for categorylinks schema change from In progress to In Review on the Data-Engineering (Q1 FY25/26 July 1st - September 30th) board.
Aug 4 2025, 3:23 PM · Patch-For-Review, Data-Engineering (Q1 FY25/26 July 1st - September 30th)
amastilovic moved T399958: Inform Data-Engineering about removal of cuc_ip, cule_ip, and cupe_ip from Next Up to In progress on the Data-Engineering (Q1 FY25/26 July 1st - September 30th) board.
Aug 4 2025, 3:22 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Trust and Safety Product Team, CheckUser
amastilovic moved T397923: Adapt Sqoop for categorylinks schema change from Urgent to In progress on the Data-Engineering (Q1 FY25/26 July 1st - September 30th) board.
Aug 4 2025, 3:21 PM · Patch-For-Review, Data-Engineering (Q1 FY25/26 July 1st - September 30th)
amastilovic moved T370665: Refine to Hive with Airflow – Handle Late-Arrived Events from In progress to Done on the Data-Engineering (Q1 FY25/26 July 1st - September 30th) board.
Aug 4 2025, 3:15 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Patch-For-Review

Jul 30 2025

amastilovic added a comment to T399958: Inform Data-Engineering about removal of cuc_ip, cule_ip, and cupe_ip.

OK so I've done some investigation work and here's what I think needs to be done:

Jul 30 2025, 11:51 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Trust and Safety Product Team, CheckUser
amastilovic closed T388861: test_produced_by_config SLA miss configured to be too small for upstream dataset run time as Resolved.
Jul 30 2025, 9:42 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th)
amastilovic added a comment to T388861: test_produced_by_config SLA miss configured to be too small for upstream dataset run time.

Agreed, I think this is WAD.

Sounds good, closing the ticket.

Jul 30 2025, 9:41 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th)

Jul 29 2025

amastilovic added a comment to T382430: Create a GitLab CI/CD Component project for WMF CI/CD templates and components.

I heard @amastilovic say today that GitLab CI components can't be used for manual CI job runs? If so, that is quite a con and maybe a reason not to use them?

Jul 29 2025, 9:51 PM · Release-Engineering-Team (Radar), Data-Engineering

Jul 28 2025

amastilovic added a comment to T400393: FAIL: refinery-drop-raw-event alerting.

I can't explain why it didn't hit the permission errors when I ran it manually.

Jul 28 2025, 10:07 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Data-Platform-SRE (2025.07.26 - 2025.08.15)

Jul 25 2025

amastilovic claimed T348958: Bump memory to enable large artifacts sync on HDFS.
Jul 25 2025, 5:57 PM · Data-Engineering (Q1 FY25/26 July 1st - September 30th), Structured-Data-Backlog
amastilovic added a comment to T400188: Increase the capacity of /var/cache/archiva on the appropriate archiva.wikimedia.org server(s).

Could the solution be to mount /var/cache/archiva onto a new volume?

Yes, we can definitely do this. I'll add a new disk of, say, 200 GB and mount this to /var/cache/archiva.
I can sync a copy of the existing data to the new disk ahead of time, which will minimize the time that the archiva service has to be down.

Jul 25 2025, 4:10 PM · Data-Platform-SRE (2025.07.26 - 2025.08.15)
amastilovic added a comment to T400188: Increase the capacity of /var/cache/archiva on the appropriate archiva.wikimedia.org server(s).

Oh it's /var/cache/archiva - it even says so in the title of the ticket!

Jul 25 2025, 4:06 PM · Data-Platform-SRE (2025.07.26 - 2025.08.15)