Page MenuHomePhabricator

mforns (Marcel Ruiz Forns)
Software Engineer @ Analytics

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Nov 7 2014, 8:52 PM (531 w, 3 d)
Availability
Available
IRC Nick
mforns
LDAP User
Mforns
MediaWiki User
Mforns (WMF) [ Global Accounts ]

Recent Activity

Yesterday

mforns moved T383364: [temp accounts] Test MediaWiki History Dump from In Progress to Ready to deploy on the DPE Temporary Accounts (Sprint 1) board.
Mon, Jan 13, 7:11 PM · DPE Temporary Accounts (Sprint 1)
mforns added a comment to T383364: [temp accounts] Test MediaWiki History Dump.

The job finished successfully and after vetting the data for a while, I couldn't find any inconsistency or wrong data.
The MediaWiki History data had already been vetted extensively, and this job only transforms it to a TSV format.
So, I trust a simple data check to determine that this works fine :-)
Will move to done.

Mon, Jan 13, 7:11 PM · DPE Temporary Accounts (Sprint 1)

Fri, Jan 10

mforns moved T381390: Modify MW History Reduced dataset for Temp Accounts from Testing to Ready to deploy on the DPE Temporary Accounts (Sprint 1) board.
Fri, Jan 10, 7:26 PM · Patch-For-Review, DPE Temporary Accounts (Sprint 1)
mforns added a comment to T381390: Modify MW History Reduced dataset for Temp Accounts.

I tested the changes to the MediaWiki History reduced query, by executing it reading from the 2024-11 snapshot of the newly generated test temp account MWH data.

Fri, Jan 10, 7:25 PM · Patch-For-Review, DPE Temporary Accounts (Sprint 1)
mforns added a comment to T383371: servicelib-golang logger missing required version attribute.

Just to clarify, IIUC, this will only break logging dashboards, right?
So, if we had no associated logging dashboards, then this would be a no-op?
🙏

Fri, Jan 10, 3:25 PM · Commons-Impact-Metrics, Cassandra
mforns added a comment to T378072: [SPIKE] Write up proposal for modularized SessionLength instrument.

One potential improvement regarding the clock ticks that we never tackled was to send them at progressively increasing intervals, instead of every minute.
The current session tick code sends ticks every minute, starting at 0, which gives us a very coarse granularity for short sessions, and too fine granularity for long sessions.
For instance, most sessions (>50%) are less than 2 minutes, but we only can tell if they are 0min long (rounded down) or 1min long (rounded down), so very little information.
At the same time, we can tell the difference between a 33min session and a 34min session, which is not that important, certainly does not justify sending 34 events.

Fri, Jan 10, 3:04 PM · Experimentation Lab (Experiment Platform Sprint 1), Patch-For-Review

Thu, Jan 9

mforns edited projects for T383367: [temp accounts] Modify geoeditors public monthly, added: DPE Temporary Accounts (Sprint 1); removed DPE Temporary Accounts.
Thu, Jan 9, 8:51 PM · DPE Temporary Accounts (Sprint 1)
mforns updated the task description for T383367: [temp accounts] Modify geoeditors public monthly.
Thu, Jan 9, 8:50 PM · DPE Temporary Accounts (Sprint 1)
mforns created T383367: [temp accounts] Modify geoeditors public monthly.
Thu, Jan 9, 8:50 PM · DPE Temporary Accounts (Sprint 1)
mforns created T383366: [temp accounts] Modify geoeditors yearly.
Thu, Jan 9, 8:44 PM · DPE Temporary Accounts (Sprint 1)
mforns moved T378610: Review input data for temp account changes from Tasked to Blocked / Paused on the DPE Temporary Accounts (Sprint 1) board.
Thu, Jan 9, 8:37 PM · DPE Temporary Accounts (Sprint 1)
mforns moved T383364: [temp accounts] Test MediaWiki History Dump from Tasked to In Progress on the DPE Temporary Accounts (Sprint 1) board.
Thu, Jan 9, 8:29 PM · DPE Temporary Accounts (Sprint 1)
mforns moved T382066: Make sure redacted editors are treated properly in MW History from In Progress to Done on the DPE Temporary Accounts (Sprint 1) board.
Thu, Jan 9, 8:29 PM · DPE Temporary Accounts (Sprint 1)
mforns added a comment to T382066: Make sure redacted editors are treated properly in MW History.

After some discussions, we think that the current code does *not* set event_user_is_anonymous to true in this line of code, since actor_name would be null, and so event_user_is_anonymous is also null.
So, the conclusion is that the current code is implementing the version that we thought should be implemented.
Since it seems we are on the right track, will move this to done.

Thu, Jan 9, 8:29 PM · DPE Temporary Accounts (Sprint 1)
mforns claimed T383364: [temp accounts] Test MediaWiki History Dump.
Thu, Jan 9, 8:23 PM · DPE Temporary Accounts (Sprint 1)
mforns created T383364: [temp accounts] Test MediaWiki History Dump.
Thu, Jan 9, 8:23 PM · DPE Temporary Accounts (Sprint 1)
mforns added a comment to T377352: [Update Pipeline] wikidata_coeditors.

@Lydia_Pintscher do you have any updates on this? 🙏

Thu, Jan 9, 7:52 PM · Patch-For-Review, DPE Temporary Accounts (Sprint 1)

Wed, Jan 8

mforns awarded T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate a Stroopwafel token.
Wed, Jan 8, 8:05 PM · Data-Engineering (Q2 2024 October 1st - December 31th), MW-1.43-notes (1.43.0-wmf.8; 2024-06-04), MediaWiki-Platform-Team (Radar)
mforns added a comment to T382740: Update Commons Impact Metrics allow-list December 2024.

This has been deployed, and the calculations have started, they should be available soon.

Wed, Jan 8, 7:48 PM · Data-Engineering, Commons-Impact-Metrics-Requests, Commons-Impact-Metrics

Tue, Jan 7

mforns added a comment to T220485: Add "Top used photos" metric.

I think there's a tag war between us and Herald...
In any case, this data is available now as part of the Commons Impact Metrics dumps. See:
https://wikitech.wikimedia.org/wiki/Commons_Impact_Metrics/Data_Model#Media_file_metrics_snapshot
Some sorting and filtering still needs to be done manually, but the metric is there.

Tue, Jan 7, 2:59 PM · Data-Engineering-Icebox, Data-Engineering, Data-Engineering-Wikistats

Mon, Dec 23

mforns created T382713: Human pageviews potentially misclassified as automated.
Mon, Dec 23, 3:56 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
mforns added a comment to T381799: wmf_contributors.commons_category_metrics_snapshot generation for year_month=2024-11 is failing .

@VirginiaPoundstone Yes, I think this is fixed.

Mon, Dec 23, 3:03 PM · Experimentation Lab

Thu, Dec 19

mforns moved T382066: Make sure redacted editors are treated properly in MW History from Tasked to In Progress on the DPE Temporary Accounts (Sprint 1) board.
Thu, Dec 19, 9:04 PM · DPE Temporary Accounts (Sprint 1)
mforns claimed T382066: Make sure redacted editors are treated properly in MW History.
Thu, Dec 19, 9:04 PM · DPE Temporary Accounts (Sprint 1)
mforns moved T379768: geoeditors_edits_monthly dag and hql from Blocked / Paused to In Progress on the DPE Temporary Accounts (Sprint 1) board.
Thu, Dec 19, 9:03 PM · Patch-For-Review, DPE Temporary Accounts (Sprint 1)
mforns moved T379631: [Update Pipeline] Update Sqoop for MediaWiki History from Done to Ready to deploy on the DPE Temporary Accounts (Sprint 1) board.
Thu, Dec 19, 9:02 PM · DPE Temporary Accounts (Sprint 1)
mforns moved T379772: Geoeditors Druid load from Blocked / Paused to Testing on the DPE Temporary Accounts (Sprint 1) board.
Thu, Dec 19, 9:02 PM · Patch-For-Review, DPE Temporary Accounts (Sprint 1)
mforns moved T379769: Private geoeditors: monthly dag and hql from Blocked / Paused to Ready to deploy on the DPE Temporary Accounts (Sprint 1) board.
Thu, Dec 19, 8:59 PM · Patch-For-Review, DPE Temporary Accounts (Sprint 1)
mforns moved T377767: [Update Pipeline] edit_hourly from Blocked / Paused to Ready to deploy on the DPE Temporary Accounts (Sprint 1) board.
Thu, Dec 19, 8:57 PM · Patch-For-Review, DPE Temporary Accounts (Sprint 1)
mforns moved T381390: Modify MW History Reduced dataset for Temp Accounts from In Review to Testing on the DPE Temporary Accounts (Sprint 1) board.
Thu, Dec 19, 8:56 PM · Patch-For-Review, DPE Temporary Accounts (Sprint 1)
mforns moved T381288: Vet MW History dataset from In Review to Done on the DPE Temporary Accounts (Sprint 1) board.
Thu, Dec 19, 8:56 PM · DPE Temporary Accounts (Sprint 1)
mforns moved T379728: [Update Pipeline] Geoeditors editors_daily_monthly from Testing to Ready to deploy on the DPE Temporary Accounts (Sprint 1) board.
Thu, Dec 19, 8:56 PM · DPE Temporary Accounts (Sprint 1)
mforns moved T377768: [Update Pipeline] edit_hourly druid load from Testing to Ready to deploy on the DPE Temporary Accounts (Sprint 1) board.
Thu, Dec 19, 8:56 PM · DPE Temporary Accounts (Sprint 1)
mforns moved T379230: [Update Pipeline] Update MediaWiki History to support Temp Accounts from Testing to Ready to deploy on the DPE Temporary Accounts (Sprint 1) board.
Thu, Dec 19, 8:56 PM · Patch-For-Review, DPE Temporary Accounts (Sprint 1)
mforns moved T379728: [Update Pipeline] Geoeditors editors_daily_monthly from In Progress to Testing on the DPE Temporary Accounts (Sprint 1) board.
Thu, Dec 19, 8:54 PM · DPE Temporary Accounts (Sprint 1)

Dec 11 2024

xcollazo awarded T381799: wmf_contributors.commons_category_metrics_snapshot generation for year_month=2024-11 is failing a Party Time token.
Dec 11 2024, 4:00 PM · Experimentation Lab

Dec 10 2024

mforns added a comment to T381799: wmf_contributors.commons_category_metrics_snapshot generation for year_month=2024-11 is failing .

Also this: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/965

Dec 10 2024, 8:30 PM · Experimentation Lab
mforns added a comment to T381799: wmf_contributors.commons_category_metrics_snapshot generation for year_month=2024-11 is failing .

Looking into this.

Dec 10 2024, 3:27 PM · Experimentation Lab
mforns claimed T381799: wmf_contributors.commons_category_metrics_snapshot generation for year_month=2024-11 is failing .
Dec 10 2024, 3:27 PM · Experimentation Lab

Dec 4 2024

mforns added a comment to T336842: Introduce new schema for WMDE banner metrics.

Hehe, 1 and 2 were intended to be a sequence, rather than 2 options. Sorry, I think my question in the end was misleading.
But knowing that you are OK with (1), 2 is just the application of that in the form of a schema (not a fragment).
And to clarify, the reason to structure the fields in a fragment and not in a schema directly, would be to be able to re-use the fragment for app vs web base schemas.

Dec 4 2024, 5:50 PM · WMDE-FUN-Sprint-2024-09-10, WMDE-FUN-Sprint-2024-08-27, WMDE-FUN-Sprint-2024-08-13, WMDE-FUN-Sprint-2024-07-30, WMDE-FUN-Sprint-2024-07-16, WMDE-Fun-Sprint-2024-07-02, WMDE-FUN-Sprint-2024-06-18, WMDE-FUN-Sprint-2024-06-04, WMDE-FUN-Sprint-2024-05-21, WMDE-FUN-Sprint-2024-05-07, WMDE-FUN-Sprint-2024-04-23, WMDE-FUN-Sprint-2024-04-09, WMDE-FUN-Sprint-2024-02-27, WMDE-FUN-Sprint-2024-02-13, Metrics Platform, Experimentation Lab, WMDE-FUN-Sprint-2024-01-30, WMDE-FUN-Team, WMDE-Fundraising-Tech

Dec 3 2024

mforns added a comment to T372855: migrate Data Platform Engineering maintained metrics from graphite to prometheus.
  1. maintain a different data store for historical metrics (preferably something we already have like an Iceberg table)

+1
I tried this some years ago with the anomaly_detection table, but there were problems with Hive partitioning affecting Superset's querying performance.
Now, with Iceberg, that should be cool and useful!

Dec 3 2024, 7:55 PM · Data-Engineering, Data Pipelines, Technical-Debt, SRE Observability (FY2024/2025-Q3), Observability-Metrics
mforns added a comment to T336842: Introduce new schema for WMDE banner metrics.

@kai.nissen so sorry for having missed all the pings for so long.

Dec 3 2024, 6:41 PM · WMDE-FUN-Sprint-2024-09-10, WMDE-FUN-Sprint-2024-08-27, WMDE-FUN-Sprint-2024-08-13, WMDE-FUN-Sprint-2024-07-30, WMDE-FUN-Sprint-2024-07-16, WMDE-Fun-Sprint-2024-07-02, WMDE-FUN-Sprint-2024-06-18, WMDE-FUN-Sprint-2024-06-04, WMDE-FUN-Sprint-2024-05-21, WMDE-FUN-Sprint-2024-05-07, WMDE-FUN-Sprint-2024-04-23, WMDE-FUN-Sprint-2024-04-09, WMDE-FUN-Sprint-2024-02-27, WMDE-FUN-Sprint-2024-02-13, Metrics Platform, Experimentation Lab, WMDE-FUN-Sprint-2024-01-30, WMDE-FUN-Team, WMDE-Fundraising-Tech
mforns claimed T377280: [CIM] Detail Key in JSON error shows "Category" instead of "Wiki" in error message for top_viewed_categories_monthly.
Dec 3 2024, 5:38 PM · Data-Engineering, Commons-Impact-Metrics
mforns reassigned T380227: Fix failing tests and update test and data from mforns to EChukwukere-WMF.
Dec 3 2024, 5:35 PM · Experimentation Lab (Data Products Sprint 23), Commons-Impact-Metrics
mforns claimed T380227: Fix failing tests and update test and data.
Dec 3 2024, 5:35 PM · Experimentation Lab (Data Products Sprint 23), Commons-Impact-Metrics

Nov 27 2024

mforns created T381026: The cleanup_tmpdumps service fails when the file to delete doesn't exist.
Nov 27 2024, 6:57 PM · Dumps-Generation, Data-Platform, Data-Engineering

Nov 26 2024

mforns added a comment to T380836: Airflow skips canary-event tasks.

This happened again from 2024-11-23, 07:35:00 UTC to 2024-11-23, 09:02:00 UTC.
Many mapped events had multiple retries all the way to 7 retries: https://airflow-analytics.wikimedia.org/dags/canary_events/grid?search=canary_events&num_run[…]00%3A00&tab=mapped_tasks&task_id=produce_canary_event

Nov 26 2024, 6:55 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
mforns renamed T380836: Airflow skips canary-event tasks from Airflow has skipped some canary-event tasks to Airflow skips canary-event tasks.
Nov 26 2024, 6:52 PM · Data-Engineering (Q2 2024 October 1st - December 31th)
mforns renamed T380836: Airflow skips canary-event tasks from Airflow has skipped some canary-event tasks when the Scheduler was failing to Airflow has skipped some canary-event tasks.
Nov 26 2024, 6:51 PM · Data-Engineering (Q2 2024 October 1st - December 31th)

Nov 25 2024

mforns added a comment to T377280: [CIM] Detail Key in JSON error shows "Category" instead of "Wiki" in error message for top_viewed_categories_monthly.

I think this issue affects several endpoints.
All CIM AQS endpoints are using aqsassist's CreateCategoryNotFoundProblem method to generate a problem message.
But not all of them have a category as a parameter.
We could chose a better problem generator from aqsassist, like: CreateMediaFileNotFoundProblem.
But there are some endpoints that have more than one parameter, like category and wiki, and aqsassist doesn't have such a problem generator method.
We could just choose the most important parameter and just modify the commons-impact-analytics code,
or we could create more problem generators in aqsassist, or even generalize the existing ones to include parameters as needed.

Nov 25 2024, 5:09 PM · Data-Engineering, Commons-Impact-Metrics

Nov 19 2024

mforns moved T267217: MediaWiki Session ID should have per-subdomain and cross-subdomain variants from Incoming to TBD on the Experimentation Lab board.
Nov 19 2024, 4:02 PM · Experimentation Lab, Metrics Platform, MediaWiki-User-management
mforns moved T321850: Add schema diffing support to jsonschema-tools and run diff in CI from Incoming to NEEDS DISCUSSION on the Experimentation Lab board.
Nov 19 2024, 3:54 PM · Data-Engineering, Event-Platform
mforns added a comment to T321850: Add schema diffing support to jsonschema-tools and run diff in CI.

@Ottomata Is there anything we Data Products could do to help with this? Or is it just as a heads up? 🙏

Nov 19 2024, 3:54 PM · Data-Engineering, Event-Platform
mforns moved T378909: eswiki most viewed pages from Spain 2015-2024 from Incoming to TBD on the Experimentation Lab board.
Nov 19 2024, 3:44 PM · Experimentation Lab, Data-Platform
mforns moved T378923: Sqoop all mysql tables from production replicas instead of CloudDB replicas from Incoming to NEEDS DISCUSSION on the Experimentation Lab board.
Nov 19 2024, 3:40 PM · Data-Engineering

Nov 14 2024

mforns moved T377768: [Update Pipeline] edit_hourly druid load from In Review to Testing on the DPE Temporary Accounts (Sprint 1) board.
Nov 14 2024, 7:04 PM · DPE Temporary Accounts (Sprint 1)

Nov 11 2024

mforns moved T379230: [Update Pipeline] Update MediaWiki History to support Temp Accounts from In Progress to In Review on the DPE Temporary Accounts (Sprint 1) board.
Nov 11 2024, 3:58 PM · Patch-For-Review, DPE Temporary Accounts (Sprint 1)

Nov 7 2024

mforns claimed T379230: [Update Pipeline] Update MediaWiki History to support Temp Accounts.
Nov 7 2024, 9:12 AM · Patch-For-Review, DPE Temporary Accounts (Sprint 1)
mforns moved T377768: [Update Pipeline] edit_hourly druid load from In Progress to In Review on the DPE Temporary Accounts (Sprint 1) board.
Nov 7 2024, 9:11 AM · DPE Temporary Accounts (Sprint 1)
mforns moved T379230: [Update Pipeline] Update MediaWiki History to support Temp Accounts from Tasked to In Progress on the DPE Temporary Accounts (Sprint 1) board.
Nov 7 2024, 9:11 AM · Patch-For-Review, DPE Temporary Accounts (Sprint 1)
mforns created T379230: [Update Pipeline] Update MediaWiki History to support Temp Accounts.
Nov 7 2024, 9:11 AM · Patch-For-Review, DPE Temporary Accounts (Sprint 1)

Nov 5 2024

mforns added a comment to T377767: [Update Pipeline] edit_hourly.

@JEbe-WMF Hi! I saw you created the patch with a new query and a new DAG, as if we were going for running a track of pipelines parallel to production.
I thought that strategy was discarded, and that we would fo for regular modifications to existing queries and DAGs, but maybe I misunderstood it...? 🙏

Nov 5 2024, 4:54 PM · Patch-For-Review, DPE Temporary Accounts (Sprint 1)
mforns added a comment to T358299: Android add stream `app_patroller_experience` to event sanitization allowlist.

@Dbrant Thanks a lot! I merged the patch, will be deployed in the next train.

Nov 5 2024, 2:47 PM · Wikipedia-Android-App-Backlog (Android Release - FY2024-25)
mforns reopened T358299: Android add stream `app_patroller_experience` to event sanitization allowlist as "Open".

Hey all! I was asked to review another unrelated sanitization patch, and I saw that the sanitization snippet for app_patroller_experience was not hashing the fields app_install_id and app_session_id.
Those are long lived identifiers and should ideally be hashed. Is there a reason not to? 🙏

Nov 5 2024, 12:26 PM · Wikipedia-Android-App-Backlog (Android Release - FY2024-25)

Nov 4 2024

mforns moved T370470: [CIM] Skewed ranking with the top Editors monthly API from Paused to Needs Review on the Experimentation Lab (Data Products (Data Products Sprint 21 🪂)) board.
Nov 4 2024, 5:21 PM · Data-Engineering, Patch-For-Review, Commons-Impact-Metrics
mforns moved T370470: [CIM] Skewed ranking with the top Editors monthly API from In Process to Paused on the Experimentation Lab (Data Products (Data Products Sprint 21 🪂)) board.
Nov 4 2024, 5:21 PM · Data-Engineering, Patch-For-Review, Commons-Impact-Metrics
mforns moved T377279: [CIM] Detail Key in JSON error shows "Category" instead of "Media" in error message from To Deploy to Testing on the Experimentation Lab (Data Products (Data Products Sprint 21 🪂)) board.
Nov 4 2024, 3:57 PM · Data-Engineering, Commons-Impact-Metrics
mforns moved T377280: [CIM] Detail Key in JSON error shows "Category" instead of "Wiki" in error message for top_viewed_categories_monthly from To Deploy to Testing on the Experimentation Lab (Data Products (Data Products Sprint 21 🪂)) board.
Nov 4 2024, 3:57 PM · Data-Engineering, Commons-Impact-Metrics
mforns moved T377281: [CIM] Detail Key in JSON error shows "Category" instead of "user name" in error message for edits_per_user_monthly from To Deploy to Testing on the Experimentation Lab (Data Products (Data Products Sprint 21 🪂)) board.
Nov 4 2024, 3:57 PM · Data-Engineering, Commons-Impact-Metrics
mforns reassigned T370470: [CIM] Skewed ranking with the top Editors monthly API from mforns to SGupta-WMF.
Nov 4 2024, 3:33 PM · Data-Engineering, Patch-For-Review, Commons-Impact-Metrics
mforns added a comment to T370470: [CIM] Skewed ranking with the top Editors monthly API.

@SGupta-WMF I retried all the process and both unit an integration tests work for me, also data ingestion on the cassandra test env side.
I asked Emeka to try it too, to see if he could reproduce your errors, but we managed to ingest the data and pass the tests fine.
Could it be something on your side?

Nov 4 2024, 3:33 PM · Data-Engineering, Patch-For-Review, Commons-Impact-Metrics
mforns moved T377768: [Update Pipeline] edit_hourly druid load from Tasked to In Progress on the DPE Temporary Accounts (Sprint 1) board.
Nov 4 2024, 3:29 PM · DPE Temporary Accounts (Sprint 1)

Oct 30 2024

mforns reassigned T377279: [CIM] Detail Key in JSON error shows "Category" instead of "Media" in error message from mforns to SGupta-WMF.
Oct 30 2024, 6:02 PM · Data-Engineering, Commons-Impact-Metrics
mforns reassigned T377280: [CIM] Detail Key in JSON error shows "Category" instead of "Wiki" in error message for top_viewed_categories_monthly from mforns to SGupta-WMF.
Oct 30 2024, 6:02 PM · Data-Engineering, Commons-Impact-Metrics
mforns reassigned T377281: [CIM] Detail Key in JSON error shows "Category" instead of "user name" in error message for edits_per_user_monthly from mforns to SGupta-WMF.
Oct 30 2024, 6:02 PM · Data-Engineering, Commons-Impact-Metrics
mforns moved T377279: [CIM] Detail Key in JSON error shows "Category" instead of "Media" in error message from Needs Review to To Deploy on the Experimentation Lab (Data Products (Data Products Sprint 21 🪂)) board.
Oct 30 2024, 6:01 PM · Data-Engineering, Commons-Impact-Metrics
mforns moved T377280: [CIM] Detail Key in JSON error shows "Category" instead of "Wiki" in error message for top_viewed_categories_monthly from Needs Review to To Deploy on the Experimentation Lab (Data Products (Data Products Sprint 21 🪂)) board.
Oct 30 2024, 6:01 PM · Data-Engineering, Commons-Impact-Metrics
mforns moved T377281: [CIM] Detail Key in JSON error shows "Category" instead of "user name" in error message for edits_per_user_monthly from Needs Review to To Deploy on the Experimentation Lab (Data Products (Data Products Sprint 21 🪂)) board.
Oct 30 2024, 6:01 PM · Data-Engineering, Commons-Impact-Metrics
mforns moved T375527: NEW BUG REPORT - Issues in calculation logic for unique devices tables from In Process to To Deploy on the Experimentation Lab (Data Products (Data Products Sprint 21 🪂)) board.
Oct 30 2024, 3:10 PM · Experimentation Lab (Data Products (Data Products Sprint 21 🪂)), Data-Engineering (Q2 2024 October 1st - December 31th), Traffic, Data-Platform

Oct 29 2024

mforns added a comment to T371532: Request for Files provided by Centro de Fotografía de Montevideo.

We fixed the issue that was preventing your category to be queried via the API endpoints.
It is working now, see: https://wikimedia.org/api/rest_v1/metrics/commons-analytics/pageviews-per-category-monthly/Files_provided_by_Centro_de_Fotograf%C3%ADa_de_Montevideo/shallow/en.wikipedia/00000101/99991231

Oct 29 2024, 8:14 PM · Experimentation Lab (Data Products Sprint 22), Commons-Impact-Metrics, Commons-Impact-Metrics-Requests

Oct 28 2024

mforns added a comment to T377600: [refine] Add support for custom Hive partitioning.

I think the timing depended on when is data going to start flowing in. Probably Q3?

Oct 28 2024, 6:25 PM · Data-Engineering, Experimentation Lab

Oct 25 2024

mforns added a comment to T375527: NEW BUG REPORT - Issues in calculation logic for unique devices tables.

I finished testing the changes.

Oct 25 2024, 9:27 PM · Experimentation Lab (Data Products (Data Products Sprint 21 🪂)), Data-Engineering (Q2 2024 October 1st - December 31th), Traffic, Data-Platform

Oct 24 2024

mforns added a comment to T370470: [CIM] Skewed ranking with the top Editors monthly API.

Did you run make bootstrap?

Yes, I:

  • Dropped the related docker containers all together.
  • Deleted the data-gateway/ directory, since it has significant changes.
  • Executed make startup.
  • Executed make bootstrap.
Oct 24 2024, 6:32 PM · Data-Engineering, Patch-For-Review, Commons-Impact-Metrics

Oct 23 2024

mforns added a comment to T370470: [CIM] Skewed ranking with the top Editors monthly API.

@SGupta-WMF could you provide more details about the failure?
(before I pushed the changes, it was working for me)

Oct 23 2024, 5:15 PM · Data-Engineering, Patch-For-Review, Commons-Impact-Metrics

Oct 21 2024

mforns moved T376196: Modify the automated traffic detection to include Redirects to pageviews from Sprint Backlog to In Process on the Experimentation Lab (Data Products (Data Products Sprint 21 🪂)) board.
Oct 21 2024, 2:56 PM · Experimentation Lab, Movement-Insights, Data-Platform

Oct 18 2024

mforns updated the task description for T377600: [refine] Add support for custom Hive partitioning.
Oct 18 2024, 3:48 PM · Data-Engineering, Experimentation Lab
mforns created T377600: [refine] Add support for custom Hive partitioning.
Oct 18 2024, 3:47 PM · Data-Engineering, Experimentation Lab
mforns updated subscribers of T366554: Views data integrity compromised by entity running up fake views.

@Stevietheman Hi! Thanks for letting us know about this issue.

Oct 18 2024, 3:36 PM · Data-Engineering, SecTeam-Processed, Pageviews-Anomaly, Security

Oct 16 2024

mforns moved T376196: Modify the automated traffic detection to include Redirects to pageviews from In Process to Sprint Backlog on the Experimentation Lab (Data Products Sprint 20 🎯) board.
Oct 16 2024, 1:07 PM · Experimentation Lab, Movement-Insights, Data-Platform
mforns moved T376196: Modify the automated traffic detection to include Redirects to pageviews from Sprint Backlog to In Process on the Experimentation Lab (Data Products Sprint 20 🎯) board.
Oct 16 2024, 1:07 PM · Experimentation Lab, Movement-Insights, Data-Platform

Oct 15 2024

mforns added a comment to T373206: [Commons Impact Metrics]: Cannot return data from categories with special characters.

Just for the record,
These issues happen to be mostly solved by another fix detailed in T368035.
By moving data retrieval to the Data Gateway the special-character issue disappeared.
There's one minor issue left, which is described in T377256.

Oct 15 2024, 6:42 PM · Experimentation Lab (Data Products Sprint 20 🎯), Commons-Impact-Metrics
mforns created T377256: [Commons Impact Metrics] Bug: Category and Media File names are URL encoded in the AQS response context section.
Oct 15 2024, 6:41 PM · Data-Engineering, Commons-Impact-Metrics
mforns added a comment to T370470: [CIM] Skewed ranking with the top Editors monthly API.

@SGupta-WMF Thanks a lot! I have pushed the changes on the cassandra test env, plus some more changes on the AQS service MR, related to the integration tests. 🙏 🙏 🙏

Oct 15 2024, 6:31 PM · Data-Engineering, Patch-For-Review, Commons-Impact-Metrics
mforns moved T376912: [CIM] cim_top_viewed_categories_monthly and top_edited_categories_monthly APIs are failing with 500 status code in PROD from Sprint Backlog to Sign Off on the Experimentation Lab (Data Products Sprint 20 🎯) board.
Oct 15 2024, 12:13 PM · Commons-Impact-Metrics, Experimentation Lab (Data Products Sprint 20 🎯)
mforns set the point value for T376912: [CIM] cim_top_viewed_categories_monthly and top_edited_categories_monthly APIs are failing with 500 status code in PROD to 3.
Oct 15 2024, 12:13 PM · Commons-Impact-Metrics, Experimentation Lab (Data Products Sprint 20 🎯)
mforns changed the point value for T370470: [CIM] Skewed ranking with the top Editors monthly API from 3 to 13.
Oct 15 2024, 12:11 PM · Data-Engineering, Patch-For-Review, Commons-Impact-Metrics
mforns set Final Story Points to 21 on T368035: Data gateway integration in CIM APIs.
Oct 15 2024, 12:10 PM · Experimentation Lab (Data Products Sprint 20 🎯), Patch-For-Review, AQS2.0

Oct 11 2024

mforns added a comment to T322690: Add support for repository artifacts in Airflow.

Oh my. Just realized I've been neglecting this task for months. Sorry for that.

Oct 11 2024, 3:48 PM · Data-Engineering-Icebox, Data-Engineering, Data Pipelines
mforns placed T307540: [Airflow Migration] Migrate reportupdater jobs up for grabs.
Oct 11 2024, 3:17 PM · Data-Engineering, Data Pipelines

Oct 9 2024

mforns closed T374699: Generate fake event data for community updates in the data lake as Resolved.

Thanks for the ping @Aklapper!

Oct 9 2024, 3:41 PM · WMF-SDS 2 Sprinthackular 2024