So one more time:
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Thu, Apr 18
For some reason, we are reattempting the 20240401 commonswiki dump, and it is failing with the same issue.
In T351117#9726548, @JAllemandou wrote:I think @Ottomata 's idea is good: having another column makes it easy to keep the "monotonic" values, while still having a de-duplication key with the new field.
Wed, Apr 17
Just did the sanity test on an-test-client1002 @xcollazo following the guide on the linked comment and looks good to me
In T362648#9721558, @Stevemunene wrote:New package installs correctly and the conda functionality seems unaffected.
stevemunene@an-test-client1002:~$ conda-analytics-clone bullseye-test Creating new cloned conda env bullseye-test... Source: /opt/conda-analytics Destination: /home/stevemunene/.conda/envs/bullseye-test The following packages cannot be cloned out of the root environment: - conda-forge/linux-64::conda-23.10.0-py310hff52083_1 - conda-forge/noarch::conda-libmamba-solver-23.12.0-pyhd8ed1ab_0 Packages: 223 Files: 1248 . . .. . . . . Wed 17 Apr 2024 07:43:56 AM UTC Created user conda environment bullseye-test To activate this environment with vanilla conda run: source /opt/conda-analytics/etc/profile.d/conda.sh conda activate bullseye-test Alternatively, you can use the conda-analytic helper script: source conda-analytics-activate bullseye-test
Tue, Apr 16
@Eevans I believe you are the owner of the production Cassandra instance.
Mon, Apr 15
Not much else to do here. For this month, there will be no commonswiki dump for the full dump (i.e "All pages with complete page edit history").
Unfortunately, after running for ~2+ days, the commonswiki dump got stuck again with the same probem as in description, against the same file.
Fri, Apr 12
Here are the steps I took following https://wikitech.wikimedia.org/wiki/Dumps/Rerunning_a_job#Rerunning_a_complete_dump:
Wrote down the CREATE TABLES according to the spec, and validated them againts a local Cassandra instance.
Thu, Apr 11
Wed, Apr 10
Tue, Apr 9
@BTullis can you please update https://wikitech.wikimedia.org/wiki/Dumps/Dumpsdata_hosts once this task is done?
Thu, Apr 4
Wed, Mar 27
This is looking pretty cool!
Mon, Mar 25
Ah, good find!
Mar 8 2024
lets maybe pair on it?
Mar 7 2024
IIUC, the necessity for py4j is only tied to the fact that we developed helper code like the case of HivePartition and DeequAnalyzersToDataQualityMetrics that we'd like to reuse, correct?
Mar 5 2024
On Monday March 4, we had a meeting with @mforns, @VirginiaPoundstone and @FRomeo_WMF were we discussed using GitLab to keep the allow list. I explained briefly how that may work, but here is a detailed proposal:
Mar 1 2024
I think all the asks from the current run of comments have been addressed in the document.
A summary of the original issues, for closure:
Feb 29 2024
This work will be critical for productionizing Dumps 2.0.
Ah, got it! Thank you both!
Feb 28 2024
In T342911#9570546, @nshahquinn-wmf wrote:The most recent run of this job (which finished today) still had a retry.
...
Should we expect duplicate data in mediawiki_wikitext_history or has that been cleaned up?
@lbowmaker clickstream_monthly_dag.py sensors typically take till the 3rd of the month to succeed, so we have about 4 days till this breaks.
All dumps marked as complete now.
Feb 27 2024
Reran the query, but this time on the new stat1011:
Most dumps now marked as "Dump complete".
Feb 26 2024
https://dumps.wikimedia.org/commonswiki/20240220/ showing progress.
Another node has picked up the job:
dumpsgen@snapshot1010:/mnt/dumpsdata/xmldatadumps/private/commonswiki$ cat lock_20240220 snapshot1011.eqiad.wmnet 4038
As per https://wikitech.wikimedia.org/wiki/Dumps/Troubleshooting, we should kill the offending commonswiki dump job, and systemd should restart it automatically.