Page MenuHomePhabricator

gmodena (GModena (WMF))
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Nov 2 2020, 1:15 PM (55 w, 5 d)
Availability
Available
LDAP User
Gmodena
MediaWiki User
GModena (WMF) [ Global Accounts ]

Recent Activity

Fri, Nov 26

gmodena moved T295360: Data pipelines skeleton should be generated from a template from Work in Progress ⚙️ to QA/Review ❓ on the Generated Data Platform board.
Fri, Nov 26, 10:15 AM · Generated Data Platform
gmodena added a comment to T295360: Data pipelines skeleton should be generated from a template.

Merge request at https://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/merge_requests/16

Fri, Nov 26, 10:15 AM · Generated Data Platform

Wed, Nov 24

gmodena updated the task description for T296381: Create new GitLab project group: Generated Data Platform.
Wed, Nov 24, 6:19 PM · GitLab (Project Migration), Release-Engineering-Team
gmodena updated the task description for T296381: Create new GitLab project group: Generated Data Platform.
Wed, Nov 24, 3:29 PM · GitLab (Project Migration), Release-Engineering-Team
gmodena created T296381: Create new GitLab project group: Generated Data Platform.
Wed, Nov 24, 12:16 PM · GitLab (Project Migration), Release-Engineering-Team

Tue, Nov 23

gmodena added a comment to T295360: Data pipelines skeleton should be generated from a template.

WIP at https://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/tree/T295360-datapipeline-scaffolding

Tue, Nov 23, 11:34 AM · Generated Data Platform

Mon, Nov 22

gmodena updated the task description for T295360: Data pipelines skeleton should be generated from a template.
Mon, Nov 22, 11:48 AM · Generated Data Platform

Fri, Nov 19

gmodena renamed T295360: Data pipelines skeleton should be generated from a template from [NEEDS GROOMING] Data pipelines skeleton should be generated from a template to Data pipelines skeleton should be generated from a template.
Fri, Nov 19, 9:52 AM · Generated Data Platform
gmodena moved T295360: Data pipelines skeleton should be generated from a template from Investigate 🔍 to Work in Progress ⚙️ on the Generated Data Platform board.
Fri, Nov 19, 9:52 AM · Generated Data Platform

Thu, Nov 18

gmodena added a comment to T295812: Data pipelines should be published to archiva.

@Ottomata @mforns happy to let you guys drive here.

Thu, Nov 18, 4:33 PM · Generated Data Platform
gmodena added a comment to T292748: [SPIKE] Create Generic Components for Scheduling.
Thu, Nov 18, 9:38 AM · Generated Data Platform
gmodena moved T292748: [SPIKE] Create Generic Components for Scheduling from Work in Progress ⚙️ to QA/Review ❓ on the Generated Data Platform board.
Thu, Nov 18, 8:51 AM · Generated Data Platform

Tue, Nov 16

gmodena created T295814: Airflow development instances should be available on demand.
Tue, Nov 16, 7:47 PM · Generated Data Platform
gmodena created T295812: Data pipelines should be published to archiva.
Tue, Nov 16, 7:43 PM · Generated Data Platform
gmodena created T295807: Production Airflow dags should be moved to the shared repo.
Tue, Nov 16, 7:37 PM · Generated Data Platform
gmodena updated subscribers of T292748: [SPIKE] Create Generic Components for Scheduling.
Tue, Nov 16, 4:34 PM · Generated Data Platform
gmodena added a comment to T292748: [SPIKE] Create Generic Components for Scheduling.

After reviewing our PoC and industry practices, IMHO we should enforce the following:

Tue, Nov 16, 4:02 PM · Generated Data Platform
gmodena updated subscribers of T269778: Introduce API versioning to Sockpuppet.

Not sure where this task belongs now.
Maybe we should close and re-open if the similarusers API is revived. @hnowlan would that be ok with you?

Tue, Nov 16, 10:02 AM · Platform Engineering
gmodena removed a project from T269778: Introduce API versioning to Sockpuppet: Platform Team Workboards (Green).
Tue, Nov 16, 9:59 AM · Platform Engineering
gmodena added a comment to T272870: [SPIKE] can we identify alternative sources for near real-time edit data?.

The conversation has since moved on. We can re-open (if needed) the API effort is revived.

Tue, Nov 16, 9:58 AM · Platform Engineering
gmodena added a comment to T285494: We should use a unique spelling for neigbhor/neighbour.

Resolving as WONTFIX. We can reuse this task for onboarding.

Tue, Nov 16, 9:57 AM · Platform Engineering
gmodena closed T285494: We should use a unique spelling for neigbhor/neighbour, a subtask of T265722: New Service Request: Sockpuppet Detection, as Resolved.
Tue, Nov 16, 9:56 AM · Platform Engineering, Platform Team Workboards (Green)
gmodena closed T285494: We should use a unique spelling for neigbhor/neighbour as Resolved.
Tue, Nov 16, 9:56 AM · Platform Engineering
gmodena updated subscribers of T277548: Improve robustness of data processing pipeline.
Tue, Nov 16, 9:54 AM · Generated Data Platform, Research
gmodena edited projects for T277548: Improve robustness of data processing pipeline, added: Generated Data Platform; removed Platform Engineering, Platform Team Workboards (Green).
Tue, Nov 16, 9:54 AM · Generated Data Platform, Research
gmodena added a comment to T275566: Provide the capability to generate static endpoint documentation for sockpuppet detection.

@hnowlan when you have a moment, could you maybe have a look at https://gerrit.wikimedia.org/r/c/mediawiki/services/similar-users/+/670848 ?

Tue, Nov 16, 9:53 AM · Platform Engineering, Anti-Harassment, Platform Team Workboards (Green)
gmodena closed T269466: We should document the data ingestion process, a subtask of T265722: New Service Request: Sockpuppet Detection, as Resolved.
Tue, Nov 16, 9:49 AM · Platform Engineering, Platform Team Workboards (Green)
gmodena closed T269466: We should document the data ingestion process as Resolved.
Tue, Nov 16, 9:49 AM · Platform Engineering
gmodena closed T272870: [SPIKE] can we identify alternative sources for near real-time edit data? as Resolved.
Tue, Nov 16, 9:48 AM · Platform Engineering
gmodena closed T272870: [SPIKE] can we identify alternative sources for near real-time edit data?, a subtask of T265722: New Service Request: Sockpuppet Detection, as Resolved.
Tue, Nov 16, 9:48 AM · Platform Engineering, Platform Team Workboards (Green)
gmodena closed T275561: We should document /similarusers response payload, a subtask of T265722: New Service Request: Sockpuppet Detection, as Resolved.
Tue, Nov 16, 9:48 AM · Platform Engineering, Platform Team Workboards (Green)
gmodena closed T275561: We should document /similarusers response payload as Resolved.
Tue, Nov 16, 9:48 AM · Platform Engineering, Anti-Harassment, Platform Team Workboards (Green)
gmodena closed T277453: Don't echo the db connection string at command line prompt , a subtask of T265722: New Service Request: Sockpuppet Detection, as Resolved.
Tue, Nov 16, 9:47 AM · Platform Engineering, Platform Team Workboards (Green)
gmodena closed T277453: Don't echo the db connection string at command line prompt as Resolved.
Tue, Nov 16, 9:46 AM · Platform Engineering, Platform Team Workboards (Green)
gmodena closed T286036: Ingest user similarity data for June 2021, a subtask of T265722: New Service Request: Sockpuppet Detection, as Resolved.
Tue, Nov 16, 9:45 AM · Platform Engineering, Platform Team Workboards (Green)
gmodena closed T286036: Ingest user similarity data for June 2021 as Resolved.
Tue, Nov 16, 9:45 AM · Platform Engineering, Data-Persistence (Consultation), Platform Team Workboards (Green)
gmodena edited projects for T287274: [SPIKE][PLACEHOLDER] we need to estimate the effort required to migrate Similarusers' backend to Cassandra, added: Generated Data Platform; removed Platform Engineering, Platform Team Workboards (Green).
Tue, Nov 16, 9:45 AM · Generated Data Platform

Thu, Nov 11

gmodena moved T292748: [SPIKE] Create Generic Components for Scheduling from Ready/Groomed 📚 to Work in Progress ⚙️ on the Generated Data Platform board.
Thu, Nov 11, 8:37 AM · Generated Data Platform
gmodena set the point value for T292748: [SPIKE] Create Generic Components for Scheduling to 5.
Thu, Nov 11, 8:37 AM · Generated Data Platform

Wed, Nov 10

gmodena added a comment to T295360: Data pipelines skeleton should be generated from a template.

@gmodena - I think it would be useful to generate an airflow DAG skeleton too, at least a simple example of how to execute the data pipeline code.

Wed, Nov 10, 8:05 PM · Generated Data Platform

Tue, Nov 9

gmodena created T295360: Data pipelines skeleton should be generated from a template.
Tue, Nov 9, 12:12 PM · Generated Data Platform
gmodena added a comment to T292743: Create Code Repo and Structure.

I have an RFC piece of documentation for this task at https://meta.wikimedia.org/wiki/User:GModena_(WMF)/Pipelines_Repo_Structure.

Tue, Nov 9, 9:32 AM · Generated Data Platform
gmodena moved T292743: Create Code Repo and Structure from Work in Progress ⚙️ to QA/Review ❓ on the Generated Data Platform board.
Tue, Nov 9, 9:30 AM · Generated Data Platform

Mon, Nov 8

gmodena added a comment to T292743: Create Code Repo and Structure.

Now, our plan is that an-airflow1003 becomes the production Airflow instance for the Platform Eng team at some point. So, if platform-airflow-dags targets an-airflow1003, then it will be production, no?

Mon, Nov 8, 12:22 PM · Generated Data Platform

Thu, Nov 4

gmodena updated the task description for T292741: Define and Implement CI Checks.
Thu, Nov 4, 9:26 AM · Generated Data Platform
gmodena added a comment to T292094: Limit GitLab shared runners to trusted contributors.

Hi releng-team,

Thu, Nov 4, 9:24 AM · Release-Engineering-Team (Done by Wed 24 Nov 🔥), SecTeam-Processed, Security-Team, User-brennen, GitLab (CI & Job Runners)

Wed, Nov 3

gmodena updated the task description for T292741: Define and Implement CI Checks.
Wed, Nov 3, 9:45 AM · Generated Data Platform
gmodena moved T292741: Define and Implement CI Checks from Work in Progress ⚙️ to QA/Review ❓ on the Generated Data Platform board.
Wed, Nov 3, 9:43 AM · Generated Data Platform
gmodena updated the task description for T292741: Define and Implement CI Checks.
Wed, Nov 3, 9:42 AM · Generated Data Platform
gmodena updated the task description for T292741: Define and Implement CI Checks.
Wed, Nov 3, 9:36 AM · Generated Data Platform

Mon, Nov 1

gmodena added a comment to T292741: Define and Implement CI Checks.

WIP at https://gitlab.wikimedia.org/gmodena/platform-airflow-dags/-/tree/T292741-implement-ci-checks

Mon, Nov 1, 11:32 AM · Generated Data Platform
gmodena added a comment to T292741: Define and Implement CI Checks.

As part of this task, I would like to mirror the platforms-airflow-dags repo to Github, and add a Github Action workflow for CI.

Mon, Nov 1, 10:52 AM · Generated Data Platform
gmodena moved T293382: [SPIKE] Investigate Different CI Checks from Work in Progress ⚙️ to QA/Review ❓ on the Generated Data Platform board.
Mon, Nov 1, 10:25 AM · Spike, Generated Data Platform
gmodena added a comment to T293382: [SPIKE] Investigate Different CI Checks.

After some investigation, my suggestions would be:

  1. Adopt (optional) type annotations as a dev practice for python code we own (starting with method signatures and return types).
  2. Add (optional) type checking (mypy) to our linting step.
  3. Add a dag integrity suite to cover airflow code.
Mon, Nov 1, 8:33 AM · Spike, Generated Data Platform

Oct 27 2021

gmodena added a comment to T293382: [SPIKE] Investigate Different CI Checks.

You can follow WIP on this story at:

Oct 27 2021, 12:39 PM · Spike, Generated Data Platform
gmodena added a comment to T292747: Define and Create Logging Routines - Airflow UI.

@gmodena - my thoughts for this task was to do something simple to also support a basic use case for a dataset producer/platform engineer.

Oct 27 2021, 11:48 AM · Airflow, Generated Data Platform
gmodena added a comment to T292743: Create Code Repo and Structure.

Do they have to clone/fork the project, make their change to image recs (not touch similar users code), push their changes back. Am I understanding that correctly?

Oct 27 2021, 11:27 AM · Generated Data Platform

Oct 26 2021

gmodena updated the task description for T292743: Create Code Repo and Structure.
Oct 26 2021, 7:00 PM · Generated Data Platform
gmodena updated the task description for T292743: Create Code Repo and Structure.
Oct 26 2021, 6:58 PM · Generated Data Platform
gmodena updated the task description for T292747: Define and Create Logging Routines - Airflow UI.
Oct 26 2021, 6:47 PM · Airflow, Generated Data Platform

Oct 25 2021

gmodena added a comment to T293256: Migrate database from SQLite to MySQL.

A drive-by comment, since T280042 was mentioned - the conversation in that task moved away from MySQL and towards Cassandra (CC'ing @gmodena).

Oct 25 2021, 2:49 PM · Image-Suggestion-API, Platform Team Workboards (Image Suggestion API)
gmodena added a comment to T293808: Design Image Recommendations Schema.

Based on this Superset query, the dataset seems to consist of the following:

Oct 25 2021, 8:36 AM · Generated Data Platform
gmodena added a comment to T293808: Design Image Recommendations Schema.

Based on the TSV files in imagerec_prod.tar.bz2, the dataset seems to consist of the following:

Oct 25 2021, 8:26 AM · Generated Data Platform

Oct 15 2021

gmodena updated the task description for T293382: [SPIKE] Investigate Different CI Checks.
Oct 15 2021, 11:33 AM · Spike, Generated Data Platform

Oct 11 2021

gmodena updated the task description for T276766: 📊 [PLACEHOLDER] Algorithm and data pipeline performance tuning .
Oct 11 2021, 10:50 AM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)
gmodena updated the task description for T276766: 📊 [PLACEHOLDER] Algorithm and data pipeline performance tuning .
Oct 11 2021, 10:22 AM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)
gmodena updated the task description for T276766: 📊 [PLACEHOLDER] Algorithm and data pipeline performance tuning .
Oct 11 2021, 10:16 AM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)
gmodena updated the task description for T276766: 📊 [PLACEHOLDER] Algorithm and data pipeline performance tuning .
Oct 11 2021, 10:15 AM · Image-Suggestion-API, Image-Suggestions, Platform Team Workboards (Image Suggestion API)

Oct 7 2021

gmodena committed rMSSG724450d4c132: Remove unused blubber settings. (authored by gmodena).
Remove unused blubber settings.
Oct 7 2021, 3:59 PM
gmodena committed rMSSG987ec1c48b29: Language fix (authored by gmodena).
Language fix
Oct 7 2021, 3:59 PM
gmodena committed rMSSG78b4e68e67df: Add test, prep and production variants (authored by gmodena).
Add test, prep and production variants
Oct 7 2021, 3:59 PM
gmodena committed rMSSGf2a754d568c4: Remove duplicate file (authored by gmodena).
Remove duplicate file
Oct 7 2021, 3:59 PM
gmodena committed rMSSG698226d7c71e: Add build variant (authored by gmodena).
Add build variant
Oct 7 2021, 3:59 PM
gmodena committed rMSSG1c2ee4703b96: Initial blubber pipeline (authored by gmodena).
Initial blubber pipeline
Oct 7 2021, 3:59 PM

Oct 6 2021

gmodena created P17429 IMA UDFs debug trace.
Oct 6 2021, 5:54 PM

Sep 13 2021

gmodena updated the task description for T287274: [SPIKE][PLACEHOLDER] we need to estimate the effort required to migrate Similarusers' backend to Cassandra.
Sep 13 2021, 7:06 AM · Generated Data Platform

Sep 9 2021

gmodena added a comment to T290664: Agree on a repository structure for Airflow-related code.

Hey @mforns thanks for starting this.

Sep 9 2021, 7:18 PM · Analytics

Aug 6 2021

gmodena closed T288114: Spark download url in build config needs update as Resolved.
Aug 6 2021, 8:55 AM · Platform Team Workboards (Image Suggestion API)

Aug 4 2021

gmodena added a comment to T288114: Spark download url in build config needs update.

PR at https://github.com/mirrys/ImageMatching/pull/28
All checks (main branch) are green again.

Aug 4 2021, 3:57 PM · Platform Team Workboards (Image Suggestion API)
gmodena updated the task description for T288114: Spark download url in build config needs update.
Aug 4 2021, 3:54 PM · Platform Team Workboards (Image Suggestion API)
gmodena moved T288114: Spark download url in build config needs update from In progress to In review on the Platform Team Workboards (Image Suggestion API) board.
Aug 4 2021, 3:53 PM · Platform Team Workboards (Image Suggestion API)
gmodena created T288114: Spark download url in build config needs update.
Aug 4 2021, 3:31 PM · Platform Team Workboards (Image Suggestion API)

Aug 3 2021

gmodena added a comment to T284225: Create airflow instances for Platform Engineering and Research.

Many thanks for this! Just wanted to give an ack that login on the host worked.

Aug 3 2021, 5:52 PM · Patch-For-Review, Analytics-Kanban, Research, Platform Engineering, Analytics

Jul 26 2021

gmodena updated the task description for T287274: [SPIKE][PLACEHOLDER] we need to estimate the effort required to migrate Similarusers' backend to Cassandra.
Jul 26 2021, 4:11 PM · Generated Data Platform

Jul 23 2021

gmodena created T287274: [SPIKE][PLACEHOLDER] we need to estimate the effort required to migrate Similarusers' backend to Cassandra.
Jul 23 2021, 6:35 PM · Generated Data Platform

Jul 14 2021

gmodena updated subscribers of T286036: Ingest user similarity data for June 2021.

June run has successfully completed on 2021-07-13 at 1600UTC/1800CEST.

Jul 14 2021, 11:43 AM · Platform Engineering, Data-Persistence (Consultation), Platform Team Workboards (Green)
gmodena moved T286036: Ingest user similarity data for June 2021 from Doing to Done on the Platform Team Workboards (Green) board.
Jul 14 2021, 11:40 AM · Platform Engineering, Data-Persistence (Consultation), Platform Team Workboards (Green)
gmodena moved T286036: Ingest user similarity data for June 2021 from Backlog to Doing on the Platform Team Workboards (Green) board.
Jul 14 2021, 11:40 AM · Platform Engineering, Data-Persistence (Consultation), Platform Team Workboards (Green)

Jul 13 2021

gmodena added a comment to T285816: Add an image: generate static file of suggestions.

I have a couple of questions re integration:

Jul 13 2021, 10:36 AM · Growth-Team (Current Sprint), Platform Team Workboards (Image Suggestion API), Image-Suggestions, Image-Suggestion-API, Growth-Structured-Tasks

Jul 2 2021

gmodena closed T284424: Ingest user similarity data for May 2021 as Resolved.
Jul 2 2021, 12:36 PM · Platform Team Workboards (Green)
gmodena closed T284424: Ingest user similarity data for May 2021, a subtask of T265722: New Service Request: Sockpuppet Detection, as Resolved.
Jul 2 2021, 12:36 PM · Platform Engineering, Platform Team Workboards (Green)
gmodena created T286036: Ingest user similarity data for June 2021.
Jul 2 2021, 12:36 PM · Platform Engineering, Data-Persistence (Consultation), Platform Team Workboards (Green)
gmodena added a comment to T284258: Knowledge store data model.

Disclaimer: total MW noob here :).

Jul 2 2021, 8:20 AM · tech-decision-forum

Jul 1 2021

gmodena moved T285494: We should use a unique spelling for neigbhor/neighbour from Ready to Waiting for Review on the Platform Team Workboards (Green) board.
Jul 1 2021, 2:06 PM · Platform Engineering

Jun 24 2021

gmodena updated the task description for T285494: We should use a unique spelling for neigbhor/neighbour.
Jun 24 2021, 6:31 PM · Platform Engineering
gmodena updated the task description for T285494: We should use a unique spelling for neigbhor/neighbour.
Jun 24 2021, 6:30 PM · Platform Engineering
gmodena moved T285494: We should use a unique spelling for neigbhor/neighbour from Next Sprint to Ready on the Platform Team Workboards (Green) board.
Jun 24 2021, 6:26 PM · Platform Engineering
gmodena moved T285494: We should use a unique spelling for neigbhor/neighbour from Backlog to Next Sprint on the Platform Team Workboards (Green) board.
Jun 24 2021, 6:25 PM · Platform Engineering
gmodena assigned T285494: We should use a unique spelling for neigbhor/neighbour to codebug.
Jun 24 2021, 6:23 PM · Platform Engineering
gmodena created T285494: We should use a unique spelling for neigbhor/neighbour.
Jun 24 2021, 6:16 PM · Platform Engineering