Page MenuHomePhabricator

mforns (Marcel Ruiz Forns)
Software Engineer @ Analytics

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Nov 7 2014, 8:52 PM (375 w, 4 d)
Availability
Available
IRC Nick
mforns
LDAP User
Mforns
MediaWiki User
Unknown

Recent Activity

Tue, Jan 11

mforns created T298972: [Airflow] Add deletion job for old anomaly detection data.
Tue, Jan 11, 2:09 PM · Data-Engineering-Kanban, Patch-For-Review, Airflow, Data-Engineering

Mon, Jan 10

mforns created T298893: [Airflow] Troubleshoot MySQL connection issues.
Mon, Jan 10, 4:39 PM · Airflow, Data-Engineering
mforns lowered the priority of T202312: Transform EventLoggingToDruid job to read schemas to ingest from an allowlist and process them all from Medium to Low.

@odimitrijevic No, I don't think this has been implemented.
It would be cool to have it be managed by an allow list, but at the same time I'm not sure about how many more datasets are we going to have that use this tool.
So reprioritizing it to low, and in case we need it, we can resurface it.

Mon, Jan 10, 11:07 AM · Data-Engineering
mforns renamed T202312: Transform EventLoggingToDruid job to read schemas to ingest from an allowlist and process them all from Transform EventLoggingToDruid job to read schemas to ingest from a whitelist and process them all to Transform EventLoggingToDruid job to read schemas to ingest from an allowlist and process them all.
Mon, Jan 10, 10:26 AM · Data-Engineering

Fri, Dec 24

mforns moved T295201: [Airflow] Migrate anomaly detection DAG to airflow-dags repo. from In Progress to Done on the Data-Engineering-Kanban board.
Fri, Dec 24, 8:29 PM · Patch-For-Review, Data-Engineering-Kanban, Airflow, Data-Engineering

Dec 13 2021

mforns added a comment to T296543: Tooling for Deploying Conda Environments .

PYSPARK_PYTHON=conda_env/bin/python spark2-submit --master yarn --deploy-mode cluster --archives=hdfs:///user/otto/conda_env.tgz#conda_env hdfs:///user/otto/call.py 'test_project.spark:main'

This looks great!

Dec 13 2021, 4:28 PM · Data-Engineering-Kanban, Airflow

Dec 8 2021

mforns added a comment to T294024: [Airflow] Automate sync'ing archiva packages to HDFS.

@Ottomata That's amazing!

Dec 8 2021, 2:32 PM · Data-Engineering-Kanban, Airflow, Data-Engineering

Dec 1 2021

mforns added a comment to T296543: Tooling for Deploying Conda Environments .
  • Get the conda env artifact
  • unpack it locall
  • launch an executable inside the conda env
Dec 1 2021, 1:45 PM · Data-Engineering-Kanban, Airflow

Nov 26 2021

mforns added a project to T296523: Refine to Airflow Migration: User Story: Epic.
Nov 26 2021, 2:27 PM · Epic, Airflow
mforns added a parent task for T295201: [Airflow] Migrate anomaly detection DAG to airflow-dags repo.: T296525: Migrate POC Jobs to Airflow.
Nov 26 2021, 2:25 PM · Patch-For-Review, Data-Engineering-Kanban, Airflow, Data-Engineering
mforns added a subtask for T296525: Migrate POC Jobs to Airflow: T295201: [Airflow] Migrate anomaly detection DAG to airflow-dags repo..
Nov 26 2021, 2:25 PM · Airflow

Nov 18 2021

mforns moved T294026: [Airflow] Create repository for Airflow DAGs from Next Up to Done on the Data-Engineering-Kanban board.
Nov 18 2021, 6:23 PM · Airflow, Data-Engineering-Kanban, Data-Engineering
mforns moved T290303: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy from Next Up to In Progress on the Data-Engineering-Kanban board.
Nov 18 2021, 6:23 PM · Analytics, Data-Engineering-Kanban, Data-Engineering, Wikidata-Termbox, Wikidata-Campsite, wdwb-tech, Wikidata, Event-Platform
mforns moved T295352: Add user accounts to LDAP group `analytics-privatedata-users` from Next Up to In Progress on the Data-Engineering-Kanban board.
Nov 18 2021, 6:23 PM · Data-Engineering-Kanban, Data-Engineering, Analytics
mforns moved T287402: Data structuring guidance request from Next Up to In Progress on the Data-Engineering-Kanban board.
Nov 18 2021, 6:22 PM · Data-Engineering-Kanban, Data-Engineering
mforns moved T295201: [Airflow] Migrate anomaly detection DAG to airflow-dags repo. from Next Up to In Progress on the Data-Engineering-Kanban board.
Nov 18 2021, 6:22 PM · Patch-For-Review, Data-Engineering-Kanban, Airflow, Data-Engineering
mforns moved T295202: [Airflow] Create a tool for easily spinning up a test Airflow instance from Next Up to In Code Review on the Data-Engineering-Kanban board.
Nov 18 2021, 6:22 PM · Data-Engineering-Kanban, Airflow, Data-Engineering

Nov 17 2021

mforns added a comment to T295814: Airflow development instances should be available on demand.

This is the data-engineering task to create that script: T295202.

Nov 17 2021, 3:35 PM · Generated Data Platform
mforns added a comment to T295812: Data pipelines should be published to archiva.

Why not! I don't think it would be super hard, no?
The important thing I guess is to come up with a set of guidelines on how a packaged env should look like.
If a package fulfills that, then the place where it comes from doesn't matter that much no?

Nov 17 2021, 3:32 PM · Generated Data Platform
mforns added a comment to T295807: Production Airflow dags should be moved to the shared repo.

We're currently working on the dependency management script
that will sync dependencies from archiva (and GitLab?) to HDFS.
I guess this is a blocker for this task, so I'll ping you ASAP when it's done.
Also, from this effort we'll come up with a format for specifying dependencies in a config file.
I'll loop you guys in in the corresponding code reviews, so you can give your thoughts on that.

Nov 17 2021, 3:28 PM · Generated Data Platform

Nov 15 2021

mforns claimed T295202: [Airflow] Create a tool for easily spinning up a test Airflow instance.
Nov 15 2021, 2:29 PM · Data-Engineering-Kanban, Airflow, Data-Engineering
mforns moved T295202: [Airflow] Create a tool for easily spinning up a test Airflow instance from Next Up to In Progress on the Airflow board.
Nov 15 2021, 2:28 PM · Data-Engineering-Kanban, Airflow, Data-Engineering

Nov 9 2021

mforns moved T295380: [Airflow] Set up scap deployment from Backlog to In Progress on the Airflow board.
Nov 9 2021, 5:16 PM · Patch-For-Review, Data-Engineering-Kanban, Airflow, Data-Engineering
mforns created T295380: [Airflow] Set up scap deployment.
Nov 9 2021, 4:38 PM · Patch-For-Review, Data-Engineering-Kanban, Airflow, Data-Engineering

Nov 8 2021

mforns added a comment to T294026: [Airflow] Create repository for Airflow DAGs.

Hey @razzi :] responding here:

Nov 8 2021, 7:27 PM · Airflow, Data-Engineering-Kanban, Data-Engineering
mforns added a project to T282033: Airflow collaborations: Epic.
Nov 8 2021, 4:57 PM · Epic, Airflow, Data-Engineering, Platform Team Workboards (Image Suggestion API)
mforns moved T271429: Replace Oozie with better workflow scheduler from Incoming to Transform on the Data-Engineering board.
Nov 8 2021, 4:57 PM · Airflow, Data-Engineering, Epic, Product-Analytics
mforns moved T282033: Airflow collaborations from Incoming to Transform on the Data-Engineering board.
Nov 8 2021, 4:57 PM · Epic, Airflow, Data-Engineering, Platform Team Workboards (Image Suggestion API)

Nov 5 2021

mforns triaged T295204: [Airflow] Organize hackathon as High priority.
Nov 5 2021, 9:49 PM · Data-Engineering-Kanban, Airflow, Data-Engineering
mforns triaged T295202: [Airflow] Create a tool for easily spinning up a test Airflow instance as High priority.
Nov 5 2021, 9:49 PM · Data-Engineering-Kanban, Airflow, Data-Engineering
mforns triaged T295045: Allow a shared, protected runner for the data-engineering group in GitLab as Medium priority.
Nov 5 2021, 9:48 PM · GitLab (CI & Job Runners), Airflow, Data-Engineering-Kanban
mforns moved T295204: [Airflow] Organize hackathon from Backlog to Next Up on the Airflow board.
Nov 5 2021, 9:48 PM · Data-Engineering-Kanban, Airflow, Data-Engineering
mforns created T295204: [Airflow] Organize hackathon.
Nov 5 2021, 9:47 PM · Data-Engineering-Kanban, Airflow, Data-Engineering
mforns moved T288271: Make it possible to use anaconda + stacked conda envs for Airflow executors from Backlog to Next Up on the Airflow board.
Nov 5 2021, 9:26 PM · Data-Engineering-Kanban, Airflow, Data-Engineering
mforns moved T294024: [Airflow] Automate sync'ing archiva packages to HDFS from Backlog to Next Up on the Airflow board.
Nov 5 2021, 9:26 PM · Data-Engineering-Kanban, Airflow, Data-Engineering
mforns moved T295202: [Airflow] Create a tool for easily spinning up a test Airflow instance from Backlog to Next Up on the Airflow board.
Nov 5 2021, 9:26 PM · Data-Engineering-Kanban, Airflow, Data-Engineering
mforns placed T294024: [Airflow] Automate sync'ing archiva packages to HDFS up for grabs.
Nov 5 2021, 9:25 PM · Data-Engineering-Kanban, Airflow, Data-Engineering
mforns created T295202: [Airflow] Create a tool for easily spinning up a test Airflow instance.
Nov 5 2021, 9:25 PM · Data-Engineering-Kanban, Airflow, Data-Engineering
mforns moved T295201: [Airflow] Migrate anomaly detection DAG to airflow-dags repo. from Backlog to In Progress on the Airflow board.
Nov 5 2021, 9:20 PM · Patch-For-Review, Data-Engineering-Kanban, Airflow, Data-Engineering
mforns created T295201: [Airflow] Migrate anomaly detection DAG to airflow-dags repo..
Nov 5 2021, 9:20 PM · Patch-For-Review, Data-Engineering-Kanban, Airflow, Data-Engineering
mforns updated the task description for T288254: Explore Containerization Solutions for DE Applications.
Nov 5 2021, 9:14 PM · Spike, Airflow, Data-Engineering
mforns added a project to T288254: Explore Containerization Solutions for DE Applications: Spike.
Nov 5 2021, 9:14 PM · Spike, Airflow, Data-Engineering
mforns removed a project from T288254: Explore Containerization Solutions for DE Applications: Spike.
Nov 5 2021, 9:12 PM · Spike, Airflow, Data-Engineering
mforns moved T288263: Airflow MVP from Backlog to Epics on the Airflow board.
Nov 5 2021, 9:10 PM · Airflow, Data-Engineering-Kanban, Epic, Data-Engineering
mforns moved T282033: Airflow collaborations from Backlog to Epics on the Airflow board.
Nov 5 2021, 9:10 PM · Epic, Airflow, Data-Engineering, Platform Team Workboards (Image Suggestion API)
mforns moved T271429: Replace Oozie with better workflow scheduler from Backlog to Epics on the Airflow board.
Nov 5 2021, 9:10 PM · Airflow, Data-Engineering, Epic, Product-Analytics
mforns moved T295199: [Airflow] User manual and documentation from Backlog to Epics on the Airflow board.
Nov 5 2021, 9:09 PM · Data-Engineering-Kanban, Epic, Airflow, Data-Engineering
mforns created T295199: [Airflow] User manual and documentation.
Nov 5 2021, 9:08 PM · Data-Engineering-Kanban, Epic, Airflow, Data-Engineering
mforns moved T285692: Write a job entirely in Airflow with spark and/or sparkSQL from Backlog to Done on the Airflow board.
Nov 5 2021, 8:17 PM · Airflow, Data-Engineering, Data-Engineering-Kanban, Patch-For-Review
mforns moved T294026: [Airflow] Create repository for Airflow DAGs from Backlog to Done on the Airflow board.
Nov 5 2021, 8:17 PM · Airflow, Data-Engineering-Kanban, Data-Engineering
mforns added a comment to T294026: [Airflow] Create repository for Airflow DAGs.

This was done a couple days ago:
https://gitlab.wikimedia.org/people/wmf-team-data-engineering

Nov 5 2021, 8:17 PM · Airflow, Data-Engineering-Kanban, Data-Engineering
mforns moved T295045: Allow a shared, protected runner for the data-engineering group in GitLab from Backlog to In Progress on the Airflow board.
Nov 5 2021, 8:16 PM · GitLab (CI & Job Runners), Airflow, Data-Engineering-Kanban
mforns updated subscribers of T263157: Process to check approximate correctness of analytics pipeline outputs.

@EBernhardson (cc @gmodena since he's asked about this before)

Nov 5 2021, 8:11 PM · Discovery-Search
mforns edited projects for T282035: Catalog, Categorize, and Templetize existing scheduled workflows, added: Data-Engineering, Airflow; removed Analytics.
Nov 5 2021, 5:33 PM · Airflow, Data-Engineering, Platform Engineering
mforns edited projects for T271429: Replace Oozie with better workflow scheduler, added: Data-Engineering, Airflow; removed Analytics.
Nov 5 2021, 5:01 PM · Airflow, Data-Engineering, Epic, Product-Analytics
mforns added a project to T288254: Explore Containerization Solutions for DE Applications: Airflow.
Nov 5 2021, 4:48 PM · Spike, Airflow, Data-Engineering
mforns added a comment to T292743: Create Code Repo and Structure.

Hi @lbowmaker and @gmodena :]

Nov 5 2021, 4:47 PM · Generated Data Platform
mforns added a project to T295072: Install spark3: Airflow.
Nov 5 2021, 4:11 PM · Airflow, Data-Engineering
mforns edited projects for T284566: Replace Airflow's HDFS client (snakebite) with pyarrow, added: Data-Engineering, Airflow; removed Analytics.
Nov 5 2021, 3:56 PM · Airflow, Data-Engineering, Platform Engineering
mforns edited projects for T282033: Airflow collaborations, added: Data-Engineering, Airflow; removed Analytics.
Nov 5 2021, 3:53 PM · Epic, Airflow, Data-Engineering, Platform Team Workboards (Image Suggestion API)
mforns added a project to T288263: Airflow MVP: Airflow.
Nov 5 2021, 3:45 PM · Airflow, Data-Engineering-Kanban, Epic, Data-Engineering
mforns edited projects for T288271: Make it possible to use anaconda + stacked conda envs for Airflow executors, added: Airflow; removed Analytics.
Nov 5 2021, 3:43 PM · Data-Engineering-Kanban, Airflow, Data-Engineering
mforns edited projects for T294024: [Airflow] Automate sync'ing archiva packages to HDFS, added: Airflow; removed Analytics.
Nov 5 2021, 3:43 PM · Data-Engineering-Kanban, Airflow, Data-Engineering
mforns removed projects from T294026: [Airflow] Create repository for Airflow DAGs: Analytics-Kanban, Analytics.
Nov 5 2021, 3:43 PM · Airflow, Data-Engineering-Kanban, Data-Engineering
mforns edited projects for T285692: Write a job entirely in Airflow with spark and/or sparkSQL, added: Airflow; removed Analytics-Kanban, Analytics.
Nov 5 2021, 3:43 PM · Airflow, Data-Engineering, Data-Engineering-Kanban, Patch-For-Review
mforns added a project to T294026: [Airflow] Create repository for Airflow DAGs: Airflow.
Nov 5 2021, 3:42 PM · Airflow, Data-Engineering-Kanban, Data-Engineering

Nov 3 2021

mforns added a comment to T294781: Create project tag for Airflow.

@Aklapper Thanks a lot for creating the project!

Nov 3 2021, 6:07 PM · Project-Admins

Nov 1 2021

mforns moved T279952: event.WikipediaPortal referer modification from Transform to Security & Governance on the Data-Engineering board.
Nov 1 2021, 5:01 PM · Data-Engineering, Privacy Engineering, FR-Tech-Analytics
mforns moved T279952: event.WikipediaPortal referer modification from Incoming to Transform on the Data-Engineering board.
Nov 1 2021, 5:01 PM · Data-Engineering, Privacy Engineering, FR-Tech-Analytics
mforns moved T291693: [Session length] Apply different sample rates per wiki from Incoming to Ingest on the Data-Engineering board.
Nov 1 2021, 5:00 PM · Data-Engineering
mforns moved T291905: Allow kafka clients to verify brokers hostnames when using SSL from Incoming to Ops on the Data-Engineering board.
Nov 1 2021, 5:00 PM · Data-Engineering, Patch-For-Review, SRE, Analytics-Radar, Event-Platform
mforns moved T292087: Setup Presto UI in production from Incoming to Visualize on the Data-Engineering board.
Nov 1 2021, 5:00 PM · User-razzi, Data-Engineering-Kanban, Analytics-Kanban, Data-Engineering, Analytics
mforns moved T293195: Add MCR slot information to revision-create events from Incoming to Ingest on the Data-Engineering board.
Nov 1 2021, 4:59 PM · MW-1.38-notes (1.38.0-wmf.9; 2021-11-16), Discovery-Search (Current work), Patch-For-Review, Data-Engineering, Event-Platform, Analytics, Wikidata, Wikidata-Query-Service
mforns moved T293243: Allow users to differentiate their JupyterHub logs in Logstash from Incoming to Visualize on the Data-Engineering board.
Nov 1 2021, 4:59 PM · Data-Engineering
mforns moved T238230: Decommission EventLogging backend components by migrating to MEP from Incoming to Ingest on the Data-Engineering board.
Nov 1 2021, 4:58 PM · Analytics, Data-Engineering, Patch-For-Review, Analytics-EventLogging, Event-Platform
mforns moved T255818: Refine drops $schema field values from Incoming to Transform on the Data-Engineering board.
Nov 1 2021, 4:58 PM · Analytics, Data-Engineering, Patch-For-Review, Event-Platform
mforns moved T259163: Migrate legacy metawiki schemas to Event Platform from Incoming to Ingest on the Data-Engineering board.
Nov 1 2021, 4:58 PM · Analytics, Data-Engineering, Better Use Of Data, Product-Analytics, MW-1.36-notes (1.36.0-wmf.18; 2020-11-17), Product-Data-Infrastructure, Event-Platform
mforns moved T259712: Allow disabling/enabling configured streams via wgEventStreams config from Incoming to Ingest on the Data-Engineering board.
Nov 1 2021, 4:58 PM · Analytics, Data-Engineering, Better Use Of Data, Product-Data-Infrastructure, Platform Team Initiatives (Modern Event Platform (TEC2)), Event-Platform
mforns moved T270431: Switch off skipTrash for some data purging from Incoming to Security & Governance on the Data-Engineering board.
Nov 1 2021, 4:57 PM · Data-Engineering-Kanban, Data-Engineering, Analytics-Kanban
mforns moved T270433: Add logic to purging scripts that requires admin action if it's about to delete a lot of data from Incoming to Security & Governance on the Data-Engineering board.
Nov 1 2021, 4:57 PM · Data-Engineering-Kanban, Data-Engineering, Patch-For-Review, Analytics-Kanban
mforns moved T274607: Add SearchSatisfaction to the allowlist from In Progress to Ready to Deploy on the Data-Engineering-Kanban board.
Nov 1 2021, 4:56 PM · Data-Engineering-Kanban, Data-Engineering, Analytics-Kanban, Analytics, Product-Analytics (Kanban)
mforns moved T274607: Add SearchSatisfaction to the allowlist from Incoming to Ingest on the Data-Engineering board.
Nov 1 2021, 4:56 PM · Data-Engineering-Kanban, Data-Engineering, Analytics-Kanban, Analytics, Product-Analytics (Kanban)
mforns moved T282131: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned from Incoming to Ingest on the Data-Engineering board.
Nov 1 2021, 4:53 PM · Data-Engineering-Kanban, Data-Engineering, Fundraising-Backlog, Better Use Of Data, Product-Analytics, Product-Data-Infrastructure, Analytics-Kanban, Analytics-EventLogging, Event-Platform
mforns moved T284150: Refactor analytics-meta MariaDB layout to use an-db100[12] from Incoming to Ops on the Data-Engineering board.
Nov 1 2021, 4:52 PM · Data-Engineering-Kanban, Data-Engineering, Analytics-Kanban, Patch-For-Review
mforns moved T286793: [EventGate] Failures when getting stream config from MediaWiki API from Incoming to Ingest on the Data-Engineering board.
Nov 1 2021, 4:51 PM · Data-Engineering
mforns moved T289003: Improve Refine bad data handling from Incoming to Transform on the Data-Engineering board.
Nov 1 2021, 4:50 PM · Data-Engineering-Kanban, Data-Engineering, Analytics-Kanban
mforns moved T290074: Users should run explicit commands to materialize schema versions, rather than using magic git hooks from Incoming to Ingest on the Data-Engineering board.
Nov 1 2021, 4:49 PM · Analytics, Data-Engineering, Patch-For-Review, Event-Platform
mforns moved T290303: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy from Incoming to Ingest on the Data-Engineering board.
Nov 1 2021, 4:48 PM · Analytics, Data-Engineering-Kanban, Data-Engineering, Wikidata-Termbox, Wikidata-Campsite, wdwb-tech, Wikidata, Event-Platform
mforns moved T291384: Standardize the stats system user uid from Incoming to Ops on the Data-Engineering board.
Nov 1 2021, 4:48 PM · Data-Engineering-Kanban, Data-Engineering, Patch-For-Review, Analytics-Kanban
mforns moved T292389: Automate kerberos credential creation and management to ease the creation of testing infrastructure from Incoming to Ops on the Data-Engineering board.
Nov 1 2021, 4:48 PM · Patch-For-Review, Data-Engineering-Kanban, Data-Engineering, Analytics-Kanban
mforns moved T294024: [Airflow] Automate sync'ing archiva packages to HDFS from Incoming to Transform on the Data-Engineering board.
Nov 1 2021, 4:47 PM · Data-Engineering-Kanban, Airflow, Data-Engineering
mforns moved T294026: [Airflow] Create repository for Airflow DAGs from Incoming to Transform on the Data-Engineering board.
Nov 1 2021, 4:47 PM · Airflow, Data-Engineering-Kanban, Data-Engineering
mforns moved T294046: Write document about "Fast Enough Superset" from Incoming to Visualize on the Data-Engineering board.
Nov 1 2021, 4:46 PM · Data-Engineering-Kanban, Data-Engineering, Analytics-Kanban, Analytics
mforns moved T294246: Sticky header: Add agent_type and access_method to sticky header instrumentation from Incoming to Transform on the Data-Engineering board.
Nov 1 2021, 4:46 PM · MW-1.38-notes (1.38.0-wmf.9; 2021-11-16), Patch-For-Review, Data-Engineering-Kanban, Data-Engineering, Analytics, MediaWiki-extensions-WikimediaEvents, Desktop Improvements, Readers-Web-Backlog (Kanbanana-FY-2021-22)
mforns removed projects from T114124: --- DISCUSSED BELOW ---: Data-Engineering, Analytics-Kanban.
Nov 1 2021, 4:40 PM · Data-Engineering-Kanban, Upstream, Trash
mforns moved T262141: pagecounts-ez of month 2020-08 is incomplete from Incoming to Datasets on the Data-Engineering board.
Nov 1 2021, 4:39 PM · Data-Engineering
mforns moved T264021: ~1 request/minute to intake-logging.wikimedia.org times out at the traffic/service interface from Incoming to Ingest on the Data-Engineering board.
Nov 1 2021, 4:37 PM · Data-Engineering, Data-Engineering-Kanban, Analytics-Kanban, SRE
mforns created T294781: Create project tag for Airflow.
Nov 1 2021, 4:35 PM · Project-Admins
mforns added a parent task for T266640: Decide whether to migrate from Presto to Trino: T294259: Presto/Superset User Experience Improvement.
Nov 1 2021, 4:18 PM · Patch-For-Review, Analytics
mforns added a subtask for T294259: Presto/Superset User Experience Improvement: T266640: Decide whether to migrate from Presto to Trino.
Nov 1 2021, 4:18 PM · Superset, Epic, Data-Engineering-Kanban, Data-Engineering