Page MenuHomePhabricator

xcollazo (Xabriel J. Collazo Mojica)
Sr. Software Engineer for Wikimedia

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Jun 9 2022, 6:42 PM (90 w, 3 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
XCollazo-WMF [ Global Accounts ]

Recent Activity

Fri, Mar 1

xcollazo moved T358120: Write a Dumps 2.0 requirements doc with emphasis on a production intermediate table from Code Review / Tech Input to Sign Off on the Data Products (Data Products Sprint 10) board.
Fri, Mar 1, 8:34 PM · Data Products (Data Products Sprint 10)
xcollazo moved T358120: Write a Dumps 2.0 requirements doc with emphasis on a production intermediate table from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 10) board.
Fri, Mar 1, 8:34 PM · Data Products (Data Products Sprint 10)
xcollazo updated the task description for T358120: Write a Dumps 2.0 requirements doc with emphasis on a production intermediate table.
Fri, Mar 1, 8:34 PM · Data Products (Data Products Sprint 10)
xcollazo added a comment to T358120: Write a Dumps 2.0 requirements doc with emphasis on a production intermediate table.

I think all the asks from the current run of comments have been addressed in the document.

Fri, Mar 1, 8:33 PM · Data Products (Data Products Sprint 10)
xcollazo updated the task description for T358120: Write a Dumps 2.0 requirements doc with emphasis on a production intermediate table.
Fri, Mar 1, 8:32 PM · Data Products (Data Products Sprint 10)
xcollazo created T358886: Decision records for Dumps 2.0.
Fri, Mar 1, 5:35 PM · Epic, Data Products
xcollazo created T358883: Define SLOs for the intermediate table of Dumps 2.0.
Fri, Mar 1, 5:33 PM · Epic, Data Products
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T358375: Declare wmf_dumps.wikitext_raw a production table.
Fri, Mar 1, 5:27 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a parent task for T358375: Declare wmf_dumps.wikitext_raw a production table: T358877: Dumps 2.0 - Production intermediate table milestone.
Fri, Mar 1, 5:27 PM · Data Products
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T358374: Remove historical errors from errors column on wmf_dumps.wikitext_raw_rc2 intermediate table.
Fri, Mar 1, 5:27 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a parent task for T358374: Remove historical errors from errors column on wmf_dumps.wikitext_raw_rc2 intermediate table: T358877: Dumps 2.0 - Production intermediate table milestone.
Fri, Mar 1, 5:27 PM · Data Products
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T358366: Consult with Product and Research team on schema and data retention expectations for wmf_dumps.wikitext_raw.
Fri, Mar 1, 5:27 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a parent task for T358366: Consult with Product and Research team on schema and data retention expectations for wmf_dumps.wikitext_raw: T358877: Dumps 2.0 - Production intermediate table milestone.
Fri, Mar 1, 5:27 PM · Data Products
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T358373: PySpark job to detect and fetch missing/corrupted revisions.
Fri, Mar 1, 5:26 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a parent task for T358373: PySpark job to detect and fetch missing/corrupted revisions: T358877: Dumps 2.0 - Production intermediate table milestone.
Fri, Mar 1, 5:26 PM · Data Products
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T358365: Implement dataset maintenance config for wmf_dumps.wikitext_raw.
Fri, Mar 1, 5:26 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a parent task for T358365: Implement dataset maintenance config for wmf_dumps.wikitext_raw: T358877: Dumps 2.0 - Production intermediate table milestone.
Fri, Mar 1, 5:26 PM · Data Products
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T338065: [Iceberg Migration] Implement mechanism for automatic Iceberg data deletion and optimization.
Fri, Mar 1, 5:26 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a parent task for T338065: [Iceberg Migration] Implement mechanism for automatic Iceberg data deletion and optimization: T358877: Dumps 2.0 - Production intermediate table milestone.
Fri, Mar 1, 5:26 PM · Data-Engineering
xcollazo added a parent task for T340466: [Iceberg Migration] P.O.C. on Iceberg sensor using Postgres table to keep status of updates: T358877: Dumps 2.0 - Production intermediate table milestone.
Fri, Mar 1, 5:25 PM · Data-Platform-SRE, Data-Engineering
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T340466: [Iceberg Migration] P.O.C. on Iceberg sensor using Postgres table to keep status of updates.
Fri, Mar 1, 5:25 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T356866: [Data Quality] Update data_quality schemas to be compatible with Iceberg tables.
Fri, Mar 1, 5:25 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a parent task for T356866: [Data Quality] Update data_quality schemas to be compatible with Iceberg tables: T358877: Dumps 2.0 - Production intermediate table milestone.
Fri, Mar 1, 5:25 PM · Data-Engineering (Sprint 9)
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T345195: [Data Quality] [SPIKE] Can we identify indicators to inform an SLO for event emission and intake?.
Fri, Mar 1, 5:25 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a parent task for T345195: [Data Quality] [SPIKE] Can we identify indicators to inform an SLO for event emission and intake?: T358877: Dumps 2.0 - Production intermediate table milestone.
Fri, Mar 1, 5:25 PM · Data-Engineering, Event-Platform
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T354761: Implement first set of data quality checks.
Fri, Mar 1, 5:22 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a parent task for T354761: Implement first set of data quality checks: T358877: Dumps 2.0 - Production intermediate table milestone.
Fri, Mar 1, 5:22 PM · Data Products (Data Products Sprint 09), Dumps 2.0
xcollazo closed T345440: Make it easier to run custom Spark versions via for_virtual_env() as Resolved.

A summary of the original issues, for closure:

Fri, Mar 1, 4:14 PM · Data Products, Dumps 2.0
xcollazo closed T345440: Make it easier to run custom Spark versions via for_virtual_env(), a subtask of T330296: Proof of concept for MediaWiki XML content dump via Event Platform, Iceberg and Spark, as Resolved.
Fri, Mar 1, 4:14 PM · Data Products (Epics Timeline), Data Pipelines, Epic
xcollazo removed a subtask for T330296: Proof of concept for MediaWiki XML content dump via Event Platform, Iceberg and Spark: T351564: Implement enriched revision visibility stream.
Fri, Mar 1, 4:04 PM · Data Products (Epics Timeline), Data Pipelines, Epic
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T351564: Implement enriched revision visibility stream.
Fri, Mar 1, 4:04 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo edited parent tasks for T351564: Implement enriched revision visibility stream, added: T358877: Dumps 2.0 - Production intermediate table milestone; removed: T330296: Proof of concept for MediaWiki XML content dump via Event Platform, Iceberg and Spark.
Fri, Mar 1, 4:04 PM · Dumps 2.0, Data Products
xcollazo removed a subtask for T346378: Update XML dump generation code to use wmf_dumps.wikitext_raw_rc1 schema.: T347611: Document new wmf_dumps tables.
Fri, Mar 1, 4:02 PM · Data Products (Sprint 02), Dumps 2.0
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T347611: Document new wmf_dumps tables.
Fri, Mar 1, 4:02 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo edited parent tasks for T347611: Document new wmf_dumps tables, added: T358877: Dumps 2.0 - Production intermediate table milestone; removed: T346378: Update XML dump generation code to use wmf_dumps.wikitext_raw_rc1 schema..
Fri, Mar 1, 4:02 PM · Data Products, Documentation, Dumps 2.0
xcollazo created T358877: Dumps 2.0 - Production intermediate table milestone.
Fri, Mar 1, 4:01 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo awarded T358691: Hadoop datanode on an-worker1173 is showing errors a Pterodactyl token.
Fri, Mar 1, 3:26 PM · Data-Platform-SRE (2024.02.12 - 2024.03.03)

Thu, Feb 29

xcollazo renamed T330296: Proof of concept for MediaWiki XML content dump via Event Platform, Iceberg and Spark from Make MediaWiki XML content dump available for external consumption to Proof of concept for MediaWiki XML content dump via Event Platform, Iceberg and Spark.
Thu, Feb 29, 6:44 PM · Data Products (Epics Timeline), Data Pipelines, Epic
xcollazo added a comment to T345195: [Data Quality] [SPIKE] Can we identify indicators to inform an SLO for event emission and intake?.

This work will be critical for productionizing Dumps 2.0.

Thu, Feb 29, 3:55 PM · Data-Engineering, Event-Platform
xcollazo added a comment to T355588: Modify ClickStreamBuilder pipeline to cope with pagelinks schema changes.

Ah, got it! Thank you both!

Thu, Feb 29, 3:29 PM · Data-Engineering, Data Products

Wed, Feb 28

xcollazo placed T342911: Data Quality Issue: Wikitext History Job fail / rerun in Airflow up for grabs.

The most recent run of this job (which finished today) still had a retry.
...
Should we expect duplicate data in mediawiki_wikitext_history or has that been cleaned up?

Wed, Feb 28, 10:20 PM · Movement-Metrics, Movement-Insights, Data Products
xcollazo updated subscribers of T355588: Modify ClickStreamBuilder pipeline to cope with pagelinks schema changes.

@lbowmaker clickstream_monthly_dag.py sensors typically take till the 3rd of the month to succeed, so we have about 4 days till this breaks.

Wed, Feb 28, 10:16 PM · Data-Engineering, Data Products
xcollazo moved T355599: [SPIKE] Draft of Mediawiki extension proposal for Metrics Platform Instrumentation (& Experimentation) from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 10) board.
Wed, Feb 28, 5:17 PM · Data Products (Data Products Sprint 10), Data-Engineering, Event-Platform (Sprint 09), Metrics Platform Backlog
xcollazo moved T358120: Write a Dumps 2.0 requirements doc with emphasis on a production intermediate table from Paused to In Process on the Data Products (Data Products Sprint 10) board.
Wed, Feb 28, 5:12 PM · Data Products (Data Products Sprint 10)
xcollazo moved T350497: Update the WikiLambda instrumentation to use core interaction events from Code Review / Tech Input to In Process on the Data Products (Data Products Sprint 10) board.
Wed, Feb 28, 5:12 PM · Data Products (Data Products Sprint 10), Patch-For-Review, Abstract Wikipedia team, WikiLambda Front-end, Metrics Platform Backlog
xcollazo moved T355409: AQS 2.0: Aqsassist and test envs. Make changes corresponding to mediawiki history reduced snapshot automation from Code Review / Tech Input to To Deploy on the Data Products (Data Products Sprint 10) board.
Wed, Feb 28, 5:09 PM · Data Products (Data Products Sprint 10), AQS2.0
xcollazo claimed T348772: Investigate why a SELECT count(1) takes 1.4 hours to plan for wikidata_raw_rc1.
Wed, Feb 28, 4:27 PM · Data Products (Data Products Sprint 10)
xcollazo moved T358458: 20240220 database backup dump appears stuck from In Process to Done on the Data Products (Data Products Sprint 10) board.
Wed, Feb 28, 4:27 PM · User-brennen, Data Products (Data Products Sprint 10), Dumps-Generation
xcollazo added a comment to T358458: 20240220 database backup dump appears stuck.

All dumps marked as complete now.

Wed, Feb 28, 4:27 PM · User-brennen, Data Products (Data Products Sprint 10), Dumps-Generation

Tue, Feb 27

xcollazo moved T353787: Decom dumpsdata100[1-2] from Backlog to Other teams on the Dumps-Generation board.
Tue, Feb 27, 3:56 PM · Data-Platform-SRE (2024.03.04 - 2024.03.24), Dumps-Generation
xcollazo moved T358458: 20240220 database backup dump appears stuck from Backlog to Active on the Dumps-Generation board.
Tue, Feb 27, 3:56 PM · User-brennen, Data Products (Data Products Sprint 10), Dumps-Generation
xcollazo moved T348772: Investigate why a SELECT count(1) takes 1.4 hours to plan for wikidata_raw_rc1 from Sprint Backlog to Done on the Data Products (Data Products Sprint 10) board.
Tue, Feb 27, 3:51 PM · Data Products (Data Products Sprint 10)
xcollazo updated the task description for T347611: Document new wmf_dumps tables.
Tue, Feb 27, 3:51 PM · Data Products, Documentation, Dumps 2.0
xcollazo added a comment to T348772: Investigate why a SELECT count(1) takes 1.4 hours to plan for wikidata_raw_rc1.

Reran the query, but this time on the new stat1011:

Tue, Feb 27, 3:50 PM · Data Products (Data Products Sprint 10)
xcollazo updated the task description for T348772: Investigate why a SELECT count(1) takes 1.4 hours to plan for wikidata_raw_rc1.
Tue, Feb 27, 3:47 PM · Data Products (Data Products Sprint 10)
xcollazo edited projects for T348772: Investigate why a SELECT count(1) takes 1.4 hours to plan for wikidata_raw_rc1, added: Data Products (Data Products Sprint 10); removed Data Products.
Tue, Feb 27, 3:45 PM · Data Products (Data Products Sprint 10)
xcollazo added a comment to T358458: 20240220 database backup dump appears stuck.

Most dumps now marked as "Dump complete".

Tue, Feb 27, 2:46 PM · User-brennen, Data Products (Data Products Sprint 10), Dumps-Generation

Mon, Feb 26

xcollazo added a comment to T358458: 20240220 database backup dump appears stuck.

https://dumps.wikimedia.org/commonswiki/20240220/ showing progress.

Mon, Feb 26, 9:01 PM · User-brennen, Data Products (Data Products Sprint 10), Dumps-Generation
xcollazo added a comment to T358458: 20240220 database backup dump appears stuck.

Another node has picked up the job:

dumpsgen@snapshot1010:/mnt/dumpsdata/xmldatadumps/private/commonswiki$ cat lock_20240220 
snapshot1011.eqiad.wmnet 4038
Mon, Feb 26, 8:42 PM · User-brennen, Data Products (Data Products Sprint 10), Dumps-Generation
xcollazo added a comment to T358458: 20240220 database backup dump appears stuck.

As per https://wikitech.wikimedia.org/wiki/Dumps/Troubleshooting, we should kill the offending commonswiki dump job, and systemd should restart it automatically.

Mon, Feb 26, 8:15 PM · User-brennen, Data Products (Data Products Sprint 10), Dumps-Generation
xcollazo moved T357143: wmf_dumps.wikitext_raw_rc2 backfill failing with FetchFailedException from In Process to Done on the Data Products (Data Products Sprint 10) board.
Mon, Feb 26, 6:28 PM · Data Products (Data Products Sprint 10)
xcollazo added a comment to T357143: wmf_dumps.wikitext_raw_rc2 backfill failing with FetchFailedException.

The spark_create_intermediate_table job did not show evidence of FetchFailedExceptions. There were retries, but they were unrelated. It also finished ~21% faster as shown below.

Mon, Feb 26, 6:28 PM · Data Products (Data Products Sprint 10)
xcollazo added a project to T358458: 20240220 database backup dump appears stuck: Data Products (Data Products Sprint 10).
Mon, Feb 26, 3:31 PM · User-brennen, Data Products (Data Products Sprint 10), Dumps-Generation
xcollazo changed the status of T358458: 20240220 database backup dump appears stuck from Open to In Progress.

Thanks for the report. Will investigate.

Mon, Feb 26, 3:31 PM · User-brennen, Data Products (Data Products Sprint 10), Dumps-Generation

Fri, Feb 23

xcollazo added a comment to T348772: Investigate why a SELECT count(1) takes 1.4 hours to plan for wikidata_raw_rc1.

Looked at this a bit today. Running a count(1) is currently not possible in stat1007 as my process is killed due to memory usage. Here are some rought notes:

Fri, Feb 23, 9:44 PM · Data Products (Data Products Sprint 10)
xcollazo added a comment to T358120: Write a Dumps 2.0 requirements doc with emphasis on a production intermediate table.

A reasonable draft with defined tasks for a "Production level intermediate table" milestone is now available at https://docs.google.com/document/d/19KVpwWCLMJKtPy8VdcIXzPeKkeREvD1tAHeHY9I5778/edit. Some tasks need further breakdown, but I think the time estimates are good.

Fri, Feb 23, 8:22 PM · Data Products (Data Products Sprint 10)
xcollazo added a comment to T357143: wmf_dumps.wikitext_raw_rc2 backfill failing with FetchFailedException.

Spark job: https://yarn.wikimedia.org/cluster/app/application_1707226456123_87407. Finished successfully in ~17 hours. Some evindence of retries, but overall the best time so far.

Fri, Feb 23, 7:54 PM · Data Products (Data Products Sprint 10)
xcollazo created T358375: Declare wmf_dumps.wikitext_raw a production table.
Fri, Feb 23, 6:48 PM · Data Products
xcollazo updated the task description for T347611: Document new wmf_dumps tables.
Fri, Feb 23, 6:46 PM · Data Products, Documentation, Dumps 2.0
xcollazo created T358374: Remove historical errors from errors column on wmf_dumps.wikitext_raw_rc2 intermediate table.
Fri, Feb 23, 6:43 PM · Data Products
xcollazo created T358373: PySpark job to detect and fetch missing/corrupted revisions.
Fri, Feb 23, 6:40 PM · Data Products
xcollazo updated the task description for T358365: Implement dataset maintenance config for wmf_dumps.wikitext_raw.
Fri, Feb 23, 6:14 PM · Data Products
xcollazo created T358366: Consult with Product and Research team on schema and data retention expectations for wmf_dumps.wikitext_raw.
Fri, Feb 23, 6:13 PM · Data Products
xcollazo created T358365: Implement dataset maintenance config for wmf_dumps.wikitext_raw.
Fri, Feb 23, 5:56 PM · Data Products

Wed, Feb 21

xcollazo updated subscribers of T340466: [Iceberg Migration] P.O.C. on Iceberg sensor using Postgres table to keep status of updates.

@lbowmaker will the "dataset state store' work be done under this ticket, or are we closing this and opening a separate one? If so, could you please point me to the "dataset state store" ticket?

Wed, Feb 21, 7:44 PM · Data-Platform-SRE, Data-Engineering
xcollazo moved T354761: Implement first set of data quality checks from Code Review / Tech Input to Done on the Data Products (Data Products Sprint 09) board.
Wed, Feb 21, 7:13 PM · Data Products (Data Products Sprint 09), Dumps 2.0
xcollazo changed the status of T358120: Write a Dumps 2.0 requirements doc with emphasis on a production intermediate table from Open to In Progress.
Wed, Feb 21, 4:59 PM · Data Products (Data Products Sprint 10)
xcollazo moved T358120: Write a Dumps 2.0 requirements doc with emphasis on a production intermediate table from Sprint Backlog to In Process on the Data Products (Data Products Sprint 10) board.
Wed, Feb 21, 4:57 PM · Data Products (Data Products Sprint 10)
xcollazo created T358120: Write a Dumps 2.0 requirements doc with emphasis on a production intermediate table.
Wed, Feb 21, 4:57 PM · Data Products (Data Products Sprint 10)
xcollazo updated the task description for T357684: Hook up data drift metrics into the Data Quality Framework.
Wed, Feb 21, 3:00 PM · Dumps 2.0, Data Products

Fri, Feb 16

xcollazo reopened T338796: Rewrite all Airflow sensors that use datacenter prepartitions to depend on both datacenters as "Open".

Reopening this one.

Fri, Feb 16, 3:33 PM · Data-Engineering (Sprint 9), Data Products (Data Products Sprint 05), serviceops-radar

Thu, Feb 15

xcollazo updated the task description for T354761: Implement first set of data quality checks.
Thu, Feb 15, 9:50 PM · Data Products (Data Products Sprint 09), Dumps 2.0
xcollazo added a comment to T356866: [Data Quality] Update data_quality schemas to be compatible with Iceberg tables.

Spoke a bit about this with @xcollazo.

There's an API available for accessing partition metadata, which can be utilized to generate IDs compatible with the current partition_id format:
https://iceberg.apache.org/docs/latest/spark-queries/#snapshots

This one https://iceberg.apache.org/docs/latest/spark-queries/#partitions

However, if I understood correctly, programmatic access to this API requires Spark 3.3 or higher.

Spark SQL access is only available 3.3+. Programatic access via the Iceberg Java API IIRC can be done with our current Spark 3.1.

Thu, Feb 15, 9:24 PM · Data-Engineering (Sprint 9)
xcollazo added a comment to T356597: Investigate if the new 'Multiblocks' user blocks feature affects the mediawiki.user-blocks-change event stream.

@xcollazo we are all clear on Dumps 1 also?

Thu, Feb 15, 8:33 PM · Data Products (Data Products Sprint 09), Multiblocks, Community-Tech, Data-Engineering, Event-Platform
xcollazo updated the task description for T357684: Hook up data drift metrics into the Data Quality Framework.
Thu, Feb 15, 3:49 PM · Dumps 2.0, Data Products
xcollazo moved T357143: wmf_dumps.wikitext_raw_rc2 backfill failing with FetchFailedException from Testing to Paused on the Data Products (Data Products Sprint 09) board.
Thu, Feb 15, 3:46 PM · Data Products (Data Products Sprint 10)
xcollazo created T357684: Hook up data drift metrics into the Data Quality Framework.
Thu, Feb 15, 3:45 PM · Dumps 2.0, Data Products
xcollazo updated subscribers of T354761: Implement first set of data quality checks.

I've added @gmodena and @JAllemandou as reviewers for the MR.

Thu, Feb 15, 3:39 PM · Data Products (Data Products Sprint 09), Dumps 2.0

Tue, Feb 13

xcollazo reopened T348772: Investigate why a SELECT count(1) takes 1.4 hours to plan for wikidata_raw_rc1 as "Open".
Tue, Feb 13, 3:02 PM · Data Products (Data Products Sprint 10)
xcollazo placed T348772: Investigate why a SELECT count(1) takes 1.4 hours to plan for wikidata_raw_rc1 up for grabs.
Tue, Feb 13, 3:02 PM · Data Products (Data Products Sprint 10)
xcollazo reopened T348772: Investigate why a SELECT count(1) takes 1.4 hours to plan for wikidata_raw_rc1, a subtask of T330296: Proof of concept for MediaWiki XML content dump via Event Platform, Iceberg and Spark, as Open.
Tue, Feb 13, 3:02 PM · Data Products (Epics Timeline), Data Pipelines, Epic
xcollazo added a comment to T348772: Investigate why a SELECT count(1) takes 1.4 hours to plan for wikidata_raw_rc1.

While working on T354761 I hit this issue again. Query planning takes a long while. I don't even want to attempt a count(1).

Tue, Feb 13, 3:02 PM · Data Products (Data Products Sprint 10)

Mon, Feb 12

xcollazo added a comment to T357143: wmf_dumps.wikitext_raw_rc2 backfill failing with FetchFailedException.

Released via https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/packages/582.

Mon, Feb 12, 9:34 PM · Data Products (Data Products Sprint 10)

Fri, Feb 9

xcollazo moved T355139: [MW History, Dumps] Review impact of recent changes to MW DBs that remove certain columns from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Fri, Feb 9, 7:24 PM · Dumps-Generation, Data Products (Data Products Sprint 07)
xcollazo moved T355139: [MW History, Dumps] Review impact of recent changes to MW DBs that remove certain columns from Backlog to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Fri, Feb 9, 7:24 PM · Dumps-Generation, Data Products (Data Products Sprint 07)
xcollazo moved T357031: Dump failure for enwiki for 20240201 from Backlog to Done on the Dumps-Generation board.
Fri, Feb 9, 7:23 PM · Data Products (Data Products Sprint 09), Dumps-Generation
xcollazo updated the task description for T354761: Implement first set of data quality checks.
Fri, Feb 9, 7:12 PM · Data Products (Data Products Sprint 09), Dumps 2.0
xcollazo moved T357143: wmf_dumps.wikitext_raw_rc2 backfill failing with FetchFailedException from To Deploy to Testing on the Data Products (Data Products Sprint 09) board.
Fri, Feb 9, 5:10 PM · Data Products (Data Products Sprint 10)
xcollazo moved T357143: wmf_dumps.wikitext_raw_rc2 backfill failing with FetchFailedException from In Process to To Deploy on the Data Products (Data Products Sprint 09) board.
Fri, Feb 9, 5:10 PM · Data Products (Data Products Sprint 10)
xcollazo moved T357143: wmf_dumps.wikitext_raw_rc2 backfill failing with FetchFailedException from Sprint Backlog to In Process on the Data Products (Data Products Sprint 09) board.
Fri, Feb 9, 3:49 PM · Data Products (Data Products Sprint 10)