Page MenuHomePhabricator
Feed Advanced Search

Thu, Apr 18

xcollazo moved T362454: commonswiki dump failure for 20240401 from In Process to Done on the Data Products (Data Products Sprint 12) board.
Thu, Apr 18, 4:31 PM · Data Products (Data Products Sprint 12), Dumps-Generation
xcollazo added a comment to T362454: commonswiki dump failure for 20240401.

So one more time:

Thu, Apr 18, 4:29 PM · Data Products (Data Products Sprint 12), Dumps-Generation
xcollazo moved T362454: commonswiki dump failure for 20240401 from Done to In Process on the Data Products (Data Products Sprint 12) board.
Thu, Apr 18, 4:21 PM · Data Products (Data Products Sprint 12), Dumps-Generation
xcollazo added a comment to T362454: commonswiki dump failure for 20240401.

For some reason, we are reattempting the 20240401 commonswiki dump, and it is failing with the same issue.

Thu, Apr 18, 4:20 PM · Data Products (Data Products Sprint 12), Dumps-Generation
xcollazo added a comment to T351117: Move analytics log from Varnish to HAProxy.

I think @Ottomata 's idea is good: having another column makes it easy to keep the "monotonic" values, while still having a de-duplication key with the new field.

Thu, Apr 18, 4:03 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
xcollazo moved T358681: [Commons Impact Metrics] Productionize SparkSQL and Spark-Scala from In Process to To Deploy on the Data Products (Data Products Sprint 12) board.
Thu, Apr 18, 3:53 PM · Data Products (Data Products Sprint 12), Patch-For-Review, Commons-Impact-Metrics

Wed, Apr 17

xcollazo added a comment to T362648: Rebuild conda-analytics container on Bullseye.

Just did the sanity test on an-test-client1002 @xcollazo following the guide on the linked comment and looks good to me

Wed, Apr 17, 7:34 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05)
xcollazo added a comment to T362648: Rebuild conda-analytics container on Bullseye.

New package installs correctly and the conda functionality seems unaffected.

stevemunene@an-test-client1002:~$ conda-analytics-clone bullseye-test
Creating new cloned conda env bullseye-test...
Source:      /opt/conda-analytics
Destination: /home/stevemunene/.conda/envs/bullseye-test
The following packages cannot be cloned out of the root environment:
 - conda-forge/linux-64::conda-23.10.0-py310hff52083_1
 - conda-forge/noarch::conda-libmamba-solver-23.12.0-pyhd8ed1ab_0
Packages: 223
Files: 1248
.
.
..
.
.
.
.
Wed 17 Apr 2024 07:43:56 AM UTC Created user conda environment bullseye-test

To activate this environment with vanilla conda run:
  source /opt/conda-analytics/etc/profile.d/conda.sh
  conda activate bullseye-test

Alternatively, you can use the conda-analytic helper script:
  source conda-analytics-activate bullseye-test

image.png (770×2 px, 116 KB)

Wed, Apr 17, 1:36 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05)

Tue, Apr 16

xcollazo updated the task description for T362697: Create Cassandra tables for Commons Impact Metrics.
Tue, Apr 16, 7:00 PM · Cassandra, Data Products (Data Products Sprint 12), Commons-Impact-Metrics
xcollazo updated subscribers of T362697: Create Cassandra tables for Commons Impact Metrics.

@Eevans I believe you are the owner of the production Cassandra instance.

Tue, Apr 16, 4:45 PM · Cassandra, Data Products (Data Products Sprint 12), Commons-Impact-Metrics
xcollazo created T362697: Create Cassandra tables for Commons Impact Metrics.
Tue, Apr 16, 4:43 PM · Cassandra, Data Products (Data Products Sprint 12), Commons-Impact-Metrics
xcollazo updated the task description for T358673: [Epic] Commons Impact Metrics Implementation.
Tue, Apr 16, 1:26 PM · Data Products (Epics Timeline), Commons-Impact-Metrics

Mon, Apr 15

xcollazo moved T362454: commonswiki dump failure for 20240401 from In Process to Done on the Data Products (Data Products Sprint 12) board.
Mon, Apr 15, 5:03 PM · Data Products (Data Products Sprint 12), Dumps-Generation
xcollazo added a comment to T362454: commonswiki dump failure for 20240401.

Not much else to do here. For this month, there will be no commonswiki dump for the full dump (i.e "All pages with complete page edit history").

Mon, Apr 15, 5:02 PM · Data Products (Data Products Sprint 12), Dumps-Generation
xcollazo moved T362454: commonswiki dump failure for 20240401 from Sprint Backlog to In Process on the Data Products (Data Products Sprint 12) board.
Mon, Apr 15, 2:21 PM · Data Products (Data Products Sprint 12), Dumps-Generation
xcollazo added a comment to T362454: commonswiki dump failure for 20240401.

Unfortunately, after running for ~2+ days, the commonswiki dump got stuck again with the same probem as in description, against the same file.

Mon, Apr 15, 2:14 PM · Data Products (Data Products Sprint 12), Dumps-Generation

Fri, Apr 12

xcollazo added a comment to T362454: commonswiki dump failure for 20240401.

Here are the steps I took following https://wikitech.wikimedia.org/wiki/Dumps/Rerunning_a_job#Rerunning_a_complete_dump:

Fri, Apr 12, 10:10 PM · Data Products (Data Products Sprint 12), Dumps-Generation
xcollazo created T362454: commonswiki dump failure for 20240401.
Fri, Apr 12, 10:09 PM · Data Products (Data Products Sprint 12), Dumps-Generation
xcollazo added a comment to T358707: [Commons Impact Metrics] Create Airflow job that formats and loads the data to Cassandra for AQS.

Wrote down the CREATE TABLES according to the spec, and validated them againts a local Cassandra instance.

Fri, Apr 12, 7:51 PM · Data Products (Data Products Sprint 12), Commons-Impact-Metrics

Thu, Apr 11

xcollazo moved T358707: [Commons Impact Metrics] Create Airflow job that formats and loads the data to Cassandra for AQS from Sprint Backlog to In Process on the Data Products (Data Products Sprint 11) board.
Thu, Apr 11, 7:39 PM · Data Products (Data Products Sprint 12), Commons-Impact-Metrics
xcollazo claimed T358707: [Commons Impact Metrics] Create Airflow job that formats and loads the data to Cassandra for AQS.
Thu, Apr 11, 7:38 PM · Data Products (Data Products Sprint 12), Commons-Impact-Metrics
xcollazo moved T358681: [Commons Impact Metrics] Productionize SparkSQL and Spark-Scala from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 11) board.
Thu, Apr 11, 6:38 PM · Data Products (Data Products Sprint 12), Patch-For-Review, Commons-Impact-Metrics

Wed, Apr 10

xcollazo reassigned T358699: [Commons Impact Metrics] Create Airflow job that generates the datasets in Iceberg from Milimetric to mforns.
Wed, Apr 10, 4:44 PM · Data Products (Data Products Sprint 12), Patch-For-Review, Commons-Impact-Metrics
xcollazo moved T356748: Adding a AQS 2.0 endpoint guide from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 11) board.
Wed, Apr 10, 4:44 PM · Data Products, AQS2.0

Tue, Apr 9

xcollazo added a comment to T325232: Migrate Dumpsdata and Htmldumper Hosts From Buster to Bullseye.

@BTullis can you please update https://wikitech.wikimedia.org/wiki/Dumps/Dumpsdata_hosts once this task is done?

Tue, Apr 9, 3:54 PM · Data-Platform-SRE (2024.03.25 - 2024.04.14), Patch-For-Review, Dumps-Generation

Thu, Apr 4

xcollazo moved T358458: 20240220 database backup dump appears stuck from Active to Done on the Dumps-Generation board.
Thu, Apr 4, 4:50 PM · User-brennen, Data Products (Data Products Sprint 10), Dumps-Generation

Wed, Mar 27

xcollazo updated Other Assignee for T358681: [Commons Impact Metrics] Productionize SparkSQL and Spark-Scala, added: mforns.
Wed, Mar 27, 4:12 PM · Data Products (Data Products Sprint 12), Patch-For-Review, Commons-Impact-Metrics
xcollazo added a comment to T353940: We should provide DQ integration with Python.

This is looking pretty cool!

Wed, Mar 27, 2:50 PM · Data-Engineering (Q4 2024 April 1st - June 30th)

Mon, Mar 25

xcollazo added a comment to T348958: Bump memory to enable large artifacts sync on HDFS.

Ah, good find!

Mon, Mar 25, 4:49 PM · Structured-Data-Backlog, Data-Engineering

Mar 8 2024

xcollazo added a comment to T353940: We should provide DQ integration with Python.

lets maybe pair on it?

Mar 8 2024, 4:36 PM · Data-Engineering (Q4 2024 April 1st - June 30th)

Mar 7 2024

xcollazo moved T358695: [Commons Impact Metrics] Establish how we represent the allow-list from Sign Off to Done on the Data Products (Data Products Sprint 10) board.
Mar 7 2024, 7:41 PM · Data Products (Data Products Sprint 10), Commons-Impact-Metrics
xcollazo updated the task description for T358695: [Commons Impact Metrics] Establish how we represent the allow-list.
Mar 7 2024, 7:40 PM · Data Products (Data Products Sprint 10), Commons-Impact-Metrics
xcollazo added a comment to T353940: We should provide DQ integration with Python.

IIUC, the necessity for py4j is only tied to the fact that we developed helper code like the case of HivePartition and DeequAnalyzersToDataQualityMetrics that we'd like to reuse, correct?

Mar 7 2024, 7:36 PM · Data-Engineering (Q4 2024 April 1st - June 30th)

Mar 5 2024

xcollazo moved T358695: [Commons Impact Metrics] Establish how we represent the allow-list from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 10) board.
Mar 5 2024, 6:58 PM · Data Products (Data Products Sprint 10), Commons-Impact-Metrics
xcollazo moved T358695: [Commons Impact Metrics] Establish how we represent the allow-list from Sprint Backlog to In Process on the Data Products (Data Products Sprint 10) board.
Mar 5 2024, 5:18 PM · Data Products (Data Products Sprint 10), Commons-Impact-Metrics
xcollazo changed the status of T358695: [Commons Impact Metrics] Establish how we represent the allow-list, a subtask of T358673: [Epic] Commons Impact Metrics Implementation, from Open to In Progress.
Mar 5 2024, 5:18 PM · Data Products (Epics Timeline), Commons-Impact-Metrics
xcollazo changed the status of T358695: [Commons Impact Metrics] Establish how we represent the allow-list from Open to In Progress.
Mar 5 2024, 5:17 PM · Data Products (Data Products Sprint 10), Commons-Impact-Metrics
xcollazo updated subscribers of T358695: [Commons Impact Metrics] Establish how we represent the allow-list.

On Monday March 4, we had a meeting with @mforns, @VirginiaPoundstone and @FRomeo_WMF were we discussed using GitLab to keep the allow list. I explained briefly how that may work, but here is a detailed proposal:

Mar 5 2024, 5:10 PM · Data Products (Data Products Sprint 10), Commons-Impact-Metrics
xcollazo claimed T358695: [Commons Impact Metrics] Establish how we represent the allow-list.
Mar 5 2024, 4:59 PM · Data Products (Data Products Sprint 10), Commons-Impact-Metrics

Mar 1 2024

xcollazo moved T358120: Write a Dumps 2.0 requirements doc with emphasis on a production intermediate table from Code Review / Tech Input to Sign Off on the Data Products (Data Products Sprint 10) board.
Mar 1 2024, 8:34 PM · Data Products (Data Products Sprint 10)
xcollazo moved T358120: Write a Dumps 2.0 requirements doc with emphasis on a production intermediate table from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 10) board.
Mar 1 2024, 8:34 PM · Data Products (Data Products Sprint 10)
xcollazo updated the task description for T358120: Write a Dumps 2.0 requirements doc with emphasis on a production intermediate table.
Mar 1 2024, 8:34 PM · Data Products (Data Products Sprint 10)
xcollazo added a comment to T358120: Write a Dumps 2.0 requirements doc with emphasis on a production intermediate table.

I think all the asks from the current run of comments have been addressed in the document.

Mar 1 2024, 8:33 PM · Data Products (Data Products Sprint 10)
xcollazo updated the task description for T358120: Write a Dumps 2.0 requirements doc with emphasis on a production intermediate table.
Mar 1 2024, 8:32 PM · Data Products (Data Products Sprint 10)
xcollazo created T358886: Decision records for Dumps 2.0.
Mar 1 2024, 5:35 PM · Data Products (Data Products Sprint 13), Epic
xcollazo created T358883: Define SLOs for the intermediate table of Dumps 2.0.
Mar 1 2024, 5:33 PM · Data Products (Data Products Sprint 13), Epic
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T358375: Declare wmf_dumps.wikitext_raw a production table.
Mar 1 2024, 5:27 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a parent task for T358375: Declare wmf_dumps.wikitext_raw a production table: T358877: Dumps 2.0 - Production intermediate table milestone.
Mar 1 2024, 5:27 PM · Data Products (Data Products Sprint 13)
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T358374: Remove historical errors from errors column on wmf_dumps.wikitext_raw_rc2 intermediate table.
Mar 1 2024, 5:27 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a parent task for T358374: Remove historical errors from errors column on wmf_dumps.wikitext_raw_rc2 intermediate table: T358877: Dumps 2.0 - Production intermediate table milestone.
Mar 1 2024, 5:27 PM · Data Products (Data Products Sprint 13)
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T358366: Consult with Product and Research team on schema and data retention expectations for wmf_dumps.wikitext_raw.
Mar 1 2024, 5:27 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a parent task for T358366: Consult with Product and Research team on schema and data retention expectations for wmf_dumps.wikitext_raw: T358877: Dumps 2.0 - Production intermediate table milestone.
Mar 1 2024, 5:27 PM · Data Products (Data Products Sprint 13)
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T358373: PySpark job to detect and fetch missing/corrupted revisions.
Mar 1 2024, 5:26 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a parent task for T358373: PySpark job to detect and fetch missing/corrupted revisions: T358877: Dumps 2.0 - Production intermediate table milestone.
Mar 1 2024, 5:26 PM · Data Products (Data Products Sprint 13)
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T358365: Implement dataset maintenance config for wmf_dumps.wikitext_raw.
Mar 1 2024, 5:26 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a parent task for T358365: Implement dataset maintenance config for wmf_dumps.wikitext_raw: T358877: Dumps 2.0 - Production intermediate table milestone.
Mar 1 2024, 5:26 PM · Data Products (Data Products Sprint 13)
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T338065: [Iceberg Migration] Implement mechanism for automatic Iceberg data deletion and optimization.
Mar 1 2024, 5:26 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a parent task for T338065: [Iceberg Migration] Implement mechanism for automatic Iceberg data deletion and optimization: T358877: Dumps 2.0 - Production intermediate table milestone.
Mar 1 2024, 5:26 PM · Data-Engineering
xcollazo added a parent task for T340466: [Iceberg Migration] P.O.C. on Iceberg sensor using Postgres table to keep status of updates: T358877: Dumps 2.0 - Production intermediate table milestone.
Mar 1 2024, 5:25 PM · Data-Platform-SRE, Data-Engineering
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T340466: [Iceberg Migration] P.O.C. on Iceberg sensor using Postgres table to keep status of updates.
Mar 1 2024, 5:25 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T356866: [Data Quality] Update data_quality schemas to be compatible with Iceberg tables.
Mar 1 2024, 5:25 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a parent task for T356866: [Data Quality] Update data_quality schemas to be compatible with Iceberg tables: T358877: Dumps 2.0 - Production intermediate table milestone.
Mar 1 2024, 5:25 PM · Data-Engineering (Q4 2024 April 1st - June 30th)
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T345195: [Data Quality] [SPIKE] Can we identify indicators to inform an SLO for event emission and intake?.
Mar 1 2024, 5:25 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a parent task for T345195: [Data Quality] [SPIKE] Can we identify indicators to inform an SLO for event emission and intake?: T358877: Dumps 2.0 - Production intermediate table milestone.
Mar 1 2024, 5:25 PM · Data-Engineering, Event-Platform
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T354761: Implement first set of data quality checks.
Mar 1 2024, 5:22 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo added a parent task for T354761: Implement first set of data quality checks: T358877: Dumps 2.0 - Production intermediate table milestone.
Mar 1 2024, 5:22 PM · Data Products (Data Products Sprint 09), Dumps 2.0
xcollazo closed T345440: Make it easier to run custom Spark versions via for_virtual_env() as Resolved.

A summary of the original issues, for closure:

Mar 1 2024, 4:14 PM · Data Products, Dumps 2.0
xcollazo closed T345440: Make it easier to run custom Spark versions via for_virtual_env(), a subtask of T330296: Proof of concept for MediaWiki XML content dump via Event Platform, Iceberg and Spark, as Resolved.
Mar 1 2024, 4:14 PM · Data Products (Epics Timeline), Data Pipelines, Epic
xcollazo removed a subtask for T330296: Proof of concept for MediaWiki XML content dump via Event Platform, Iceberg and Spark: T351564: Implement enriched revision visibility stream.
Mar 1 2024, 4:04 PM · Data Products (Epics Timeline), Data Pipelines, Epic
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T351564: Implement enriched revision visibility stream.
Mar 1 2024, 4:04 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo edited parent tasks for T351564: Implement enriched revision visibility stream, added: T358877: Dumps 2.0 - Production intermediate table milestone; removed: T330296: Proof of concept for MediaWiki XML content dump via Event Platform, Iceberg and Spark.
Mar 1 2024, 4:04 PM · Dumps 2.0, Data Products
xcollazo removed a subtask for T346378: Update XML dump generation code to use wmf_dumps.wikitext_raw_rc1 schema.: T347611: Document new wmf_dumps tables.
Mar 1 2024, 4:02 PM · Data Products (Sprint 02), Dumps 2.0
xcollazo added a subtask for T358877: Dumps 2.0 - Production intermediate table milestone: T347611: Document new wmf_dumps tables.
Mar 1 2024, 4:02 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo edited parent tasks for T347611: Document new wmf_dumps tables, added: T358877: Dumps 2.0 - Production intermediate table milestone; removed: T346378: Update XML dump generation code to use wmf_dumps.wikitext_raw_rc1 schema..
Mar 1 2024, 4:02 PM · Data Products, Documentation, Dumps 2.0
xcollazo created T358877: Dumps 2.0 - Production intermediate table milestone.
Mar 1 2024, 4:01 PM · Data Products (Epics Timeline), Dumps 2.0, Epic
xcollazo awarded T358691: Hadoop datanode on an-worker1173 is showing errors a Pterodactyl token.
Mar 1 2024, 3:26 PM · Data-Platform-SRE (2024.02.12 - 2024.03.03)

Feb 29 2024

xcollazo renamed T330296: Proof of concept for MediaWiki XML content dump via Event Platform, Iceberg and Spark from Make MediaWiki XML content dump available for external consumption to Proof of concept for MediaWiki XML content dump via Event Platform, Iceberg and Spark.
Feb 29 2024, 6:44 PM · Data Products (Epics Timeline), Data Pipelines, Epic
xcollazo added a comment to T345195: [Data Quality] [SPIKE] Can we identify indicators to inform an SLO for event emission and intake?.

This work will be critical for productionizing Dumps 2.0.

Feb 29 2024, 3:55 PM · Data-Engineering, Event-Platform
xcollazo added a comment to T355588: Modify ClickStreamBuilder pipeline to cope with pagelinks schema changes.

Ah, got it! Thank you both!

Feb 29 2024, 3:29 PM · Data-Engineering (Q4 2024 April 1st - June 30th), Data Products

Feb 28 2024

xcollazo placed T342911: Data Quality Issue: Wikitext History Job fail / rerun in Airflow up for grabs.

The most recent run of this job (which finished today) still had a retry.
...
Should we expect duplicate data in mediawiki_wikitext_history or has that been cleaned up?

Feb 28 2024, 10:20 PM · Data-Engineering (Q4 2024 April 1st - June 30th), Data Products, Movement-Metrics, Movement-Insights
xcollazo updated subscribers of T355588: Modify ClickStreamBuilder pipeline to cope with pagelinks schema changes.

@lbowmaker clickstream_monthly_dag.py sensors typically take till the 3rd of the month to succeed, so we have about 4 days till this breaks.

Feb 28 2024, 10:16 PM · Data-Engineering (Q4 2024 April 1st - June 30th), Data Products
xcollazo moved T355599: [SPIKE] Draft of Mediawiki extension proposal for Metrics Platform Instrumentation (& Experimentation) from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 10) board.
Feb 28 2024, 5:17 PM · Data Products (Data Products Sprint 10), Data-Engineering, Event-Platform (Sprint 09), Metrics Platform Backlog
xcollazo moved T358120: Write a Dumps 2.0 requirements doc with emphasis on a production intermediate table from Paused to In Process on the Data Products (Data Products Sprint 10) board.
Feb 28 2024, 5:12 PM · Data Products (Data Products Sprint 10)
xcollazo moved T350497: Update the WikiLambda instrumentation to use core interaction events from Code Review / Tech Input to In Process on the Data Products (Data Products Sprint 10) board.
Feb 28 2024, 5:12 PM · Patch-For-Review, MW-1.42-notes (1.42.0-wmf.26; 2024-04-09), Data Products (Data Products Sprint 11), Abstract Wikipedia team, WikiLambda Front-end, Metrics Platform Backlog
xcollazo moved T355409: AQS 2.0: Aqsassist and test envs. Make changes corresponding to mediawiki history reduced snapshot automation from Code Review / Tech Input to To Deploy on the Data Products (Data Products Sprint 10) board.
Feb 28 2024, 5:09 PM · Data Products (Data Products Sprint 13), AQS2.0
xcollazo claimed T348772: Investigate why a SELECT count(1) takes 1.4 hours to plan for wikidata_raw_rc1.
Feb 28 2024, 4:27 PM · Data Products (Data Products Sprint 10)
xcollazo moved T358458: 20240220 database backup dump appears stuck from In Process to Done on the Data Products (Data Products Sprint 10) board.
Feb 28 2024, 4:27 PM · User-brennen, Data Products (Data Products Sprint 10), Dumps-Generation
xcollazo added a comment to T358458: 20240220 database backup dump appears stuck.

All dumps marked as complete now.

Feb 28 2024, 4:27 PM · User-brennen, Data Products (Data Products Sprint 10), Dumps-Generation

Feb 27 2024

xcollazo moved T353787: Decom dumpsdata100[1-2] from Backlog to Other teams on the Dumps-Generation board.
Feb 27 2024, 3:56 PM · Patch-For-Review, Data-Platform-SRE (2024.03.25 - 2024.04.14), Dumps-Generation
xcollazo moved T358458: 20240220 database backup dump appears stuck from Backlog to Active on the Dumps-Generation board.
Feb 27 2024, 3:56 PM · User-brennen, Data Products (Data Products Sprint 10), Dumps-Generation
xcollazo moved T348772: Investigate why a SELECT count(1) takes 1.4 hours to plan for wikidata_raw_rc1 from Sprint Backlog to Done on the Data Products (Data Products Sprint 10) board.
Feb 27 2024, 3:51 PM · Data Products (Data Products Sprint 10)
xcollazo updated the task description for T347611: Document new wmf_dumps tables.
Feb 27 2024, 3:51 PM · Data Products, Documentation, Dumps 2.0
xcollazo added a comment to T348772: Investigate why a SELECT count(1) takes 1.4 hours to plan for wikidata_raw_rc1.

Reran the query, but this time on the new stat1011:

Feb 27 2024, 3:50 PM · Data Products (Data Products Sprint 10)
xcollazo updated the task description for T348772: Investigate why a SELECT count(1) takes 1.4 hours to plan for wikidata_raw_rc1.
Feb 27 2024, 3:47 PM · Data Products (Data Products Sprint 10)
xcollazo edited projects for T348772: Investigate why a SELECT count(1) takes 1.4 hours to plan for wikidata_raw_rc1, added: Data Products (Data Products Sprint 10); removed Data Products.
Feb 27 2024, 3:45 PM · Data Products (Data Products Sprint 10)
xcollazo added a comment to T358458: 20240220 database backup dump appears stuck.

Most dumps now marked as "Dump complete".

Feb 27 2024, 2:46 PM · User-brennen, Data Products (Data Products Sprint 10), Dumps-Generation

Feb 26 2024

xcollazo added a comment to T358458: 20240220 database backup dump appears stuck.

https://dumps.wikimedia.org/commonswiki/20240220/ showing progress.

Feb 26 2024, 9:01 PM · User-brennen, Data Products (Data Products Sprint 10), Dumps-Generation
xcollazo added a comment to T358458: 20240220 database backup dump appears stuck.

Another node has picked up the job:

dumpsgen@snapshot1010:/mnt/dumpsdata/xmldatadumps/private/commonswiki$ cat lock_20240220 
snapshot1011.eqiad.wmnet 4038
Feb 26 2024, 8:42 PM · User-brennen, Data Products (Data Products Sprint 10), Dumps-Generation
xcollazo added a comment to T358458: 20240220 database backup dump appears stuck.

As per https://wikitech.wikimedia.org/wiki/Dumps/Troubleshooting, we should kill the offending commonswiki dump job, and systemd should restart it automatically.

Feb 26 2024, 8:15 PM · User-brennen, Data Products (Data Products Sprint 10), Dumps-Generation
xcollazo moved T357143: wmf_dumps.wikitext_raw_rc2 backfill failing with FetchFailedException from In Process to Done on the Data Products (Data Products Sprint 10) board.
Feb 26 2024, 6:28 PM · Data Products (Data Products Sprint 10)