Page MenuHomePhabricator

APizzata-WMF (a-pizzata)
User

Projects (1)

Today

  • No visible events.

Tomorrow

  • No visible events.

Saturday

  • No visible events.

User Details

User Since
Oct 3 2025, 9:25 AM (18 w, 6 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
APizzata-WMF [ Global Accounts ]

Recent Activity

Today

APizzata-WMF moved T368987: Add an Image: filtering by suggestion "kind" or "confidence" from Next Up to In progress on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Thu, Feb 12, 10:35 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), Growth-Team, Image-Suggestions
APizzata-WMF added a comment to T412461: On reconcile, consider what happens when a restore and a delete happen on the same revision.

Yes indeed, in T416491 I have an example of this situation and a possible cure. We can either close this, or link the two together?

Thu, Feb 12, 10:11 AM · Data-Engineering, DPE-Mediawiki-Content
APizzata-WMF added a comment to T410431: Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily.
spark.sql("""
SELECT count(1) as count FROM (
  SELECT count(1) as count,
         wiki_id,
         revision_id
  FROM wmf_content.mediawiki_content_history_v1
  GROUP BY wiki_id, revision_id
  HAVING count > 1
)
""").show(300, truncate=False)

returned

+-----+
|count|
+-----+
|0    |
+-----+

As already stated in T410431#11566608 we now consider the issue solved. Will push relevant changes to the repo and close the ticket.

Thu, Feb 12, 8:44 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Tue, Feb 10

APizzata-WMF added a comment to T416312: Use wmf.mediawiki_history as baseline for slo completeness.

a one-time comparison to wmf.mediawiki_history

Tue, Feb 10, 1:12 PM · DPE-Mediawiki-Content, Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF updated subscribers of T416491: Missing reconciliation for MWCH.

My change is the following:

from pyspark.sql import functions as F
from pyspark.sql.window import Window
deletes_and_restores_query="""
SELECT distinct log_page AS page_id, log_action, log_timestamp
Tue, Feb 10, 10:23 AM · DPE-Mediawiki-Content, Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Mon, Feb 9

APizzata-WMF updated the task description for T415311: MediaWiki content history dataset issues.
Mon, Feb 9, 4:48 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), Research

Thu, Feb 5

APizzata-WMF updated the task description for T416491: Missing reconciliation for MWCH.
Thu, Feb 5, 3:03 PM · DPE-Mediawiki-Content, Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF updated subscribers of T416312: Use wmf.mediawiki_history as baseline for slo completeness.

After a fruitful conversation with @JAllemandou, we came to the conclusion that for the different nature of data stored in the wmf.mediawiki_history and the events.* tables would be better to think of a decoupled metric:

Thu, Feb 5, 2:12 PM · DPE-Mediawiki-Content, Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Wed, Feb 4

APizzata-WMF added a project to T416491: Missing reconciliation for MWCH: DPE-Mediawiki-Content.
Wed, Feb 4, 3:06 PM · DPE-Mediawiki-Content, Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF created T416491: Missing reconciliation for MWCH.
Wed, Feb 4, 3:05 PM · DPE-Mediawiki-Content, Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Tue, Feb 3

APizzata-WMF created T416312: Use wmf.mediawiki_history as baseline for slo completeness.
Tue, Feb 3, 10:13 AM · DPE-Mediawiki-Content, Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Mon, Feb 2

APizzata-WMF moved T415638: Make canary-events for the `resource_change` stream from Ready to Deploy to Done on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mon, Feb 2, 4:39 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF moved T414389: Publish Dumps 2 to dumps.wikimedia.org and provide only monthly dumps from In progress to Done on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mon, Feb 2, 4:31 PM · User-notice, Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF moved T414784: Test the dbt+skein approach to running dbt Spark jobs in K8s from Next Up to Done on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mon, Feb 2, 4:15 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF moved T415267: aggregate_for_fundraising_hourly failing for last 24 hours from Next Up to In progress on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mon, Feb 2, 4:15 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF moved T415194: Create a `DbtSkeinOperator` in the Airflow `wmf_airflow_common` library from Next Up to In progress on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mon, Feb 2, 4:13 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF moved T414107: Inventory of SystemD timer based jobs and pipelines from In Review to Done on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mon, Feb 2, 4:12 PM · Essential-Work, Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF moved T410431: Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily from In progress to In Review on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Mon, Feb 2, 4:06 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Thu, Jan 29

APizzata-WMF added a comment to T410431: Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily.

We have changed the pushdown_strategy to earliest_revision_dt and this should avoid the duplication. AFAICS from the 2026-01-11 (day that we changed) we are not having duplicates anymore.
We will continue monitoring the situation for the next 2 weeks (up to 2026-02-11) and if no duplicate shows up we can consider the situation fixed.

Thu, Jan 29, 4:32 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Thu, Jan 22

APizzata-WMF created T415264: mw_content_history_reconcile_enrich api call returned 503.
Thu, Jan 22, 2:33 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Wed, Jan 21

APizzata-WMF created T415195: Review SLIS image suggestion pipeline.
Wed, Jan 21, 1:54 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF moved T413075: Review ALIS image suggestion pipeline from In Review to Done on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Wed, Jan 21, 1:29 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Tue, Jan 20

APizzata-WMF moved T414779: Change delete selection for SLO metric from Blocked/Paused to Ready to Deploy on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Tue, Jan 20, 7:11 AM · DPE-Mediawiki-Content, Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF moved T414779: Change delete selection for SLO metric from In progress to Blocked/Paused on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Tue, Jan 20, 7:11 AM · DPE-Mediawiki-Content, Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF moved T413075: Review ALIS image suggestion pipeline from In progress to In Review on the Data-Engineering (Q3 FY25/26 January 1st - March 31th) board.
Tue, Jan 20, 7:11 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Mon, Jan 19

APizzata-WMF added a comment to T410431: Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily.

I have yet to understand what's happening behind the scenes, but maybe the presence of the delete does not allow for a correct reconciliation?
What's your thoughts @xcollazo @JAllemandou

Mon, Jan 19, 4:09 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF added a comment to T410431: Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily.

While exploring the data and validating the solution in T414779, found the following curious example:
page_id= 69510715 and wiki_id = 'enwiki'

Mon, Jan 19, 2:03 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF added a comment to T414779: Change delete selection for SLO metric.

Testing the following solution:

deletes_and_undeletes as (
        -- select all the pages that have both delete and undelete change_kind
        -- and order by the latest operation (given by meta.dt)
            SELECT
                wiki_id,
                page['page_id'] AS page_id,
                page_change_kind,
                row_number() over (
                    partition by wiki_id, page['page_id'] order by meta.dt desc
                ) as rn
            FROM
                {source_mw_event_page_change_table}
            WHERE
                {hive_filter}
            AND
                page_change_kind in ('delete', 'undelete')
        ),
        deleted_pages as (
            -- identify the pages that have been deleted and not restored:
            --page_change_kind is delete and it is the most recent operation
            SELECT
                wiki_id,
                page_id
            FROM
                deletes_and_undeletes
            WHERE
                rn=1 AND
                page_change_kind= 'delete'
        ),
Mon, Jan 19, 12:41 PM · DPE-Mediawiki-Content, Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Fri, Jan 16

APizzata-WMF created T414781: Gitlab data-engineering resource access tokens expiring.
Fri, Jan 16, 10:23 AM · Release-Engineering-Team (Doing 😎), GitLab (Integrations), User-brennen
APizzata-WMF created T414779: Change delete selection for SLO metric.
Fri, Jan 16, 10:06 AM · DPE-Mediawiki-Content, Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Wed, Jan 14

APizzata-WMF added a comment to T413075: Review ALIS image suggestion pipeline.

Created a document with info about ALIS and Cassandra DAGs.
Will use the same file for future for more notes and info dump.

Wed, Jan 14, 4:29 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Tue, Jan 13

APizzata-WMF added a comment to T410431: Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily.

After today's meeting with @xcollazo and @JAllemandou we realised the following.
Using the query in T410431#11465081:

spark.sql("""
select 
    meta.dt,
    revision.rev_dt, 
    day,month,year, 
    page.page_id, 
    page.page_title,
    revision.rev_id, 
    page_change_kind
from  
    event.mediawiki_page_content_change_v1
where wiki_id= 'commonswiki' 
    and page.page_id in(178775087,100282687)  
order by meta.dt, page.page_id asc
""").show(truncate=False)
+---------------------------+--------------------+---+-----+----+---------+----------------------------+----------+----------------+
|dt                         |rev_dt              |day|month|year|page_id  |page_title                  |rev_id    |page_change_kind|
+---------------------------+--------------------+---+-----+----+---------+----------------------------+----------+----------------+
|2025-11-19T07:26:41.89519Z |2025-11-19T07:26:37Z|19 |11   |2025|100282687|File:Flag_of_Ulleung.svg    |1118294200|edit            |
|2025-11-20T07:39:25.814734Z|2025-11-20T07:39:21Z|20 |11   |2025|100282687|File:Flag_of_Ulleung.svg    |1118806092|edit            |
|2025-11-20T21:16:29.814428Z|2025-11-20T07:39:21Z|20 |11   |2025|100282687|File:Flag_of_Ulleung.svg    |1118806092|delete          |
|2025-11-20T21:16:35.813491Z|2025-11-19T07:26:37Z|20 |11   |2025|100282687|File:Flag_of_Ulleung.svg    |1118294200|undelete        |
|2025-11-20T21:16:37.824006Z|2025-11-20T21:16:32Z|20 |11   |2025|100282687|File:Flag_of_Ulleung_(2).svg|1119085982|move            |
|2025-11-20T21:17:31.75636Z |2025-11-20T21:17:30Z|20 |11   |2025|100282687|File:Flag_of_Ulleung_(2).svg|1119086355|edit            |
|2025-11-20T21:34:28.240769Z|2025-11-20T07:39:21Z|20 |11   |2025|178775087|File:Flag_of_Ulleung.svg    |1118806092|undelete        |
|2025-11-20T21:34:44.24225Z |2025-11-20T21:34:41Z|20 |11   |2025|178775087|File:Flag_of_Ulleung.svg    |1119092139|edit            |
|2025-12-05T08:38:05.952091Z|2025-12-05T08:38:02Z|5  |12   |2025|100282687|File:Flag_of_Ulleung_(2).svg|1125950709|edit            |
|2025-12-05T08:38:40.678265Z|2025-12-05T08:38:36Z|5  |12   |2025|178775087|File:Flag_of_Ulleung.svg    |1125950925|edit            |
+---------------------------+--------------------+---+-----+----+---------+----------------------------+----------+----------------+

But ordering the result by meta.dt we can infer that the undelete with rev_id 1118294200 of the page_id 1118294200 was a way to reverse the edit of edit 1118806092.
The move 1119085982 is part of the moved page File:Flag of Ulleung.svg to File:Flag of Ulleung (2).svg without leaving a redirect: -- using SplitFileHistory.js procedure that is applied consciously by users. This, allows the users to move the page to a new one without leaving a redirect.
Finally the duplication of the rev_id 1118294298 comes from the delete 1118806092 of the page_id 100282687 and due to the optimization logic we apply when merging data ('set_of_page_ids' pushdown_strategy) that avoids the full scan of the table.

Tue, Jan 13, 5:03 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Jan 12 2026

APizzata-WMF added a comment to T411803: Fix reconcile bug where user_id is not being populated correctly..

Done T410431#11512827

Jan 12 2026, 3:48 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF added a comment to T410431: Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily.

Run a deduplication like what was done in https://phabricator.wikimedia.org/T404975#11197939 :
duplicates situation for mediawiki_content_history_v1:

spark.sql("""
SELECT count(*) as total_duplicate
FROM (
    SELECT
         count(1) as count,
         wiki_id,
         revision_id
    FROM wmf_content.mediawiki_content_history_v1
    GROUP BY wiki_id ,revision_id
    HAVING count > 1
)
""").show(3000, truncate=False)
Jan 12 2026, 3:46 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF added a comment to T411803: Fix reconcile bug where user_id is not being populated correctly..

Hey @xcollazo I will run the heuristic to deduplicate. Texting you here when is everything is clean.

Jan 12 2026, 1:48 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Jan 8 2026

APizzata-WMF added a comment to T413888: duplicated page_title in mediawiki_content_current_v1.

Great find! I think I see the root cause now, but I will let you play with it more and come to your own conclusions!

Is it connected to the DELETE operation only being able to be used with the WHEN MATCHED clause?

Jan 8 2026, 5:25 PM · Patch-For-Review, Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF added a comment to T413888: duplicated page_title in mediawiki_content_current_v1.

The pages showing in the mediawiki_content_current_v1 tables that seem duplicate are actually caused by the unexecuted delete operation in the creation of the current table.

Jan 8 2026, 3:11 PM · Patch-For-Review, Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF updated subscribers of T413075: Review ALIS image suggestion pipeline.
Jan 8 2026, 1:06 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Jan 7 2026

APizzata-WMF added a comment to T413075: Review ALIS image suggestion pipeline.

Performed first round of study and analysis, waiting for a call with Marco scheduled on Thursday

Jan 7 2026, 11:42 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF moved T413888: duplicated page_title in mediawiki_content_current_v1 from Next Up to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Jan 7 2026, 11:42 AM · Patch-For-Review, Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF created T413950: Reset MFA for a-pizzata.
Jan 7 2026, 9:22 AM · Phabricator

Jan 6 2026

APizzata-WMF added a comment to T410431: Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily.

Regarding T410431#11492207 after a call with @xcollazo we determined that this duplication in the current table is due to another possible bug. Created: T413888

Jan 6 2026, 4:14 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF claimed T413888: duplicated page_title in mediawiki_content_current_v1.
Jan 6 2026, 4:13 PM · Patch-For-Review, Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF created T413888: duplicated page_title in mediawiki_content_current_v1.
Jan 6 2026, 4:12 PM · Patch-For-Review, Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Jan 5 2026

APizzata-WMF moved T413075: Review ALIS image suggestion pipeline from Next Up to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Jan 5 2026, 12:39 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Dec 19 2025

APizzata-WMF added a comment to T401892: Update MediaWiki Content History SLO draft for SRE review.

Uploaded a new set of metrics with the following names:

  • wmf_content_mediawiki_content_history_v1_completeness_sli_days: counter of the days that the metric has been executed
  • wmf_content_mediawiki_content_history_v1_completeness_sli_alerts: count of alerts in case the completeness of the table is under the SLO threshold.
Dec 19 2025, 3:47 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Dec 17 2025

APizzata-WMF added a comment to T410431: Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily.

Regarding this:

I was discussing this issue with @JAllemandou, and he mentioned that these rows could very well be coming from the logging table. See discussion in an MR comment thread here.

Dec 17 2025, 5:14 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF added a comment to T411803: Fix reconcile bug where user_id is not being populated correctly..

Executed deduplication steps as we discussed: T410431#11469747

Dec 17 2025, 5:05 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF added a comment to T410431: Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily.

Run a deduplication like what was done in https://phabricator.wikimedia.org/T404975#11197939 :

Dec 17 2025, 5:03 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Dec 16 2025

APizzata-WMF added a comment to T412119: Access Admin menu in Airflow.

Thanks @BTullis, I can now see the menu!

Dec 16 2025, 4:28 PM · SRE, LDAP-Access-Requests, Data-Platform-SRE (2025.11.07 - 2025.11.28)
APizzata-WMF added a comment to T410431: Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily.

After a discussion with @xcollazo we realised that the problem is connected to a combination of undelete and delete events.

Dec 16 2025, 4:27 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Dec 15 2025

APizzata-WMF moved T401892: Update MediaWiki Content History SLO draft for SRE review from In progress to Ready to Deploy on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Dec 15 2025, 4:37 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF moved T410431: Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily from Next Up to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Dec 15 2025, 4:36 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Dec 9 2025

APizzata-WMF created T412119: Access Admin menu in Airflow.
Dec 9 2025, 2:54 PM · SRE, LDAP-Access-Requests, Data-Platform-SRE (2025.11.07 - 2025.11.28)

Dec 8 2025

APizzata-WMF claimed T410431: Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily.
Dec 8 2025, 4:39 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF moved T411973: change metric name for prometheus slo from In progress to Done on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Dec 8 2025, 3:59 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
APizzata-WMF added a comment to T410431: Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily.

Current situation after the monthly reconciliation (query executed on 2025-12-08):

spark.sql("""
SELECT count(*) as total_duplicates
FROM (
    SELECT
         count(1) as count,
         wiki_id,
         revision_id
    FROM wmf_content.mediawiki_content_history_v1
    GROUP BY wiki_id ,revision_id
    HAVING count > 1
)
""").show(3000, truncate=False)
Dec 8 2025, 3:57 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF moved T411973: change metric name for prometheus slo from Next Up to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Dec 8 2025, 8:28 AM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
APizzata-WMF created T411973: change metric name for prometheus slo.
Dec 8 2025, 8:27 AM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Dec 3 2025

APizzata-WMF moved T410579: Implement Mediawiki Content History SLO monitoring and alerting from In progress to Done on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Dec 3 2025, 3:32 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)

Nov 26 2025

APizzata-WMF moved T410579: Implement Mediawiki Content History SLO monitoring and alerting from Next Up to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Nov 26 2025, 9:14 AM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
APizzata-WMF added a comment to T401892: Update MediaWiki Content History SLO draft for SRE review.

All the action items from my side have been published on the document

Nov 26 2025, 9:14 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Nov 24 2025

APizzata-WMF added a comment to T401892: Update MediaWiki Content History SLO draft for SRE review.

Action items on the previous linked document:

Nov 24 2025, 11:55 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Nov 21 2025

APizzata-WMF moved T401892: Update MediaWiki Content History SLO draft for SRE review from Blocked/Paused to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Nov 21 2025, 11:02 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Nov 20 2025

APizzata-WMF added a comment to T401892: Update MediaWiki Content History SLO draft for SRE review.

updated the document: https://wikitech.wikimedia.org/wiki/SLO/MediaWiki_Content_History_Table

Nov 20 2025, 4:00 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Nov 6 2025

APizzata-WMF created T409470: mediawiki_history_dumps failure.
Nov 6 2025, 5:35 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
APizzata-WMF edited P84877 completeness SLO mwch.
Nov 6 2025, 10:38 AM

Nov 5 2025

APizzata-WMF added a comment to T401892: Update MediaWiki Content History SLO draft for SRE review.

I have created the P84877 with the results of the query. I have also changed the final columns to fit a more generic approach and not specialised for the mwch. The query now also runs on group by wiki_id and there is no more reason to iterate over them.
The spark configuration tested is the following:

config = {
            "spark.driver.memory": "16g",
            "spark.driver.cores": 4,
            "spark.driver.maxResultSize": "8g",
            "spark.dynamicAllocation.maxExecutors": 32,
            "spark.executor.memory": "16g",
            "spark.executor.cores": 2,
            "spark.sql.shuffle.partitions": 512,
}

I feel it could be reduced in the final development stages. Please lmk what you think about this!

Nov 5 2025, 11:03 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)
APizzata-WMF edited P84877 completeness SLO mwch.
Nov 5 2025, 10:59 AM
APizzata-WMF edited P84877 completeness SLO mwch.
Nov 5 2025, 10:54 AM
APizzata-WMF created P84877 completeness SLO mwch.
Nov 5 2025, 10:51 AM

Oct 31 2025

APizzata-WMF added a comment to T401892: Update MediaWiki Content History SLO draft for SRE review.

I have updated the qeury:

Oct 31 2025, 3:42 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Oct 29 2025

APizzata-WMF claimed T401892: Update MediaWiki Content History SLO draft for SRE review.
Oct 29 2025, 5:40 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Oct 27 2025

APizzata-WMF moved T400360: Fix Hive event.development_network_probe table from Ready to Deploy to Done on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 27 2025, 3:18 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Traffic
APizzata-WMF moved T400360: Fix Hive event.development_network_probe table from Done to Ready to Deploy on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 27 2025, 3:18 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Traffic
APizzata-WMF moved T400360: Fix Hive event.development_network_probe table from Blocked/Paused to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 27 2025, 3:18 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Traffic
APizzata-WMF moved T405040: Global Editor Metrics - backfill pageview metric data from Next Up to In progress on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 27 2025, 3:09 PM · Data-Engineering (Q3 FY25/26 January 1st - March 31th), OKR-Work, MediaWiki-Page-derived-data

Oct 24 2025

APizzata-WMF added a comment to T401892: Update MediaWiki Content History SLO draft for SRE review.

What about the source data we ingest?
I was thinking that a possible completeness measure should include also the source event.mediawiki_page_content_change_v1 table and union it with the wmf_content.inconsistent_rows_of_mediawiki_content_history_v1. Something like this:

Oct 24 2025, 11:28 AM · Data-Engineering (Q3 FY25/26 January 1st - March 31th)

Oct 14 2025

APizzata-WMF created T407228: Requesting access to "analytics-admins" and "deployment" groups for a-pizzata.
Oct 14 2025, 1:12 PM · SRE, SRE-Access-Requests

Oct 10 2025

APizzata-WMF added a comment to T406328: Requesting access to Data Platform for a-pizzata.

@MoritzMuehlenhoff perfect, thank you very much!

Oct 10 2025, 7:50 AM · SRE, SRE-Access-Requests

Oct 3 2025

APizzata-WMF updated the task description for T406328: Requesting access to Data Platform for a-pizzata.
Oct 3 2025, 1:10 PM · SRE, SRE-Access-Requests
APizzata-WMF created T406328: Requesting access to Data Platform for a-pizzata.
Oct 3 2025, 1:06 PM · SRE, SRE-Access-Requests