Two new page properties wbl-forms and wbl-senses have been created to count and display the number of Forms and Senses for each Lexeme.
However, on many pages this information is not added yet, because one edit on the entity is needed to populate, but most Lexemes have not been updated since then.
Since this issue is blocking the development of some tools, it should be solved. A script could help populating all Lexemes with these new page properties.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Ladsgroup | T191424 track number of Lexemes, Forms and Senses | |||
Resolved | Ladsgroup | T199611 New page properties for number of Forms and Senses of a Lexeme | |||
Resolved | LucasWerkmeister | T225510 Run a script to populate number of Forms and Senses on all Lexemes |
Event Timeline
I looked into this a bit – a regular purge doesn’t refresh the page props, but apparently a link-update purge does. This is available via the API – for example, I just fixed the page props for L31883 with action=purge&titles=Lexeme:L31883&forcelinkupdate=1. So I think anyone could actually do this task :)
I’m now running this on PAWS:
@PAWS:/srv/paws/pwb$ time python scripts/touch.py -start:Lexeme:! -purge -forcelinkupdate -family:wikidata -lang:wikidata
It sleeps for ~10 seconds between each purge, so it should be done in a few days.
Mentioned in SAL (#wikimedia-cloud) [2019-06-12T08:48:48Z] <wm-bot> <lucaswerkmeister> kubectl create -f deployment-purge-all-lexemes.yaml # T225510
The PAWS terminal died for some reason (I guess you can’t leave them running in the background?), so now I’m doing it with a separate Python script from a Kubernetes deployment. Source code is on GitHub; I can’t push it to Phabricator yet due to T224677.
Sufficiently privileged users (probably only me and Toolforge admins?) can see the progress with:
kubectl logs lexeme-forms.purge-all-lexemes-2706089478-b6dcm | tail
This script purges 30 pages at once and then sleeps 75 seconds in between, so it should be done in a bit less than 1½ days.
Mentioned in SAL (#wikimedia-cloud) [2019-06-14T00:38:11Z] <wm-bot> <lucaswerkmeister> kubectl delete deployment lexeme-forms.purge-all-lexemes # T225510 done