Page MenuHomePhabricator

Run a script to populate number of Forms and Senses on all Lexemes
Closed, ResolvedPublic

Description

Two new page properties wbl-forms and wbl-senses have been created to count and display the number of Forms and Senses for each Lexeme.
However, on many pages this information is not added yet, because one edit on the entity is needed to populate, but most Lexemes have not been updated since then.
Since this issue is blocking the development of some tools, it should be solved. A script could help populating all Lexemes with these new page properties.

Event Timeline

I looked into this a bit – a regular purge doesn’t refresh the page props, but apparently a link-update purge does. This is available via the API – for example, I just fixed the page props for L31883 with action=purge&titles=Lexeme:L31883&forcelinkupdate=1. So I think anyone could actually do this task :)

I’m now running this on PAWS:

@PAWS:/srv/paws/pwb$ time python scripts/touch.py -start:Lexeme:! -purge -forcelinkupdate -family:wikidata -lang:wikidata

It sleeps for ~10 seconds between each purge, so it should be done in a few days.

Mentioned in SAL (#wikimedia-cloud) [2019-06-12T08:48:48Z] <wm-bot> <lucaswerkmeister> kubectl create -f deployment-purge-all-lexemes.yaml # T225510

LucasWerkmeister added a comment.EditedJun 12 2019, 8:54 AM

The PAWS terminal died for some reason (I guess you can’t leave them running in the background?), so now I’m doing it with a separate Python script from a Kubernetes deployment. Source code is on GitHub; I can’t push it to Phabricator yet due to T224677.

Sufficiently privileged users (probably only me and Toolforge admins?) can see the progress with:

kubectl logs lexeme-forms.purge-all-lexemes-2706089478-b6dcm | tail

This script purges 30 pages at once and then sleeps 75 seconds in between, so it should be done in a bit less than 1½ days.

Mentioned in SAL (#wikimedia-cloud) [2019-06-14T00:38:11Z] <wm-bot> <lucaswerkmeister> kubectl delete deployment lexeme-forms.purge-all-lexemes # T225510 done

Okay, the Python script finished, and now there are only 107 lexemes without the page props left (Quarry 1, Quarry 2). Not sure why those few are still missing the page props… should I run the script again and see if that helps?

LucasWerkmeister closed this task as Resolved.Jun 14 2019, 5:14 PM
LucasWerkmeister claimed this task.

Ah, those 107 pages are redirects :) if you filter redirects from the results, there’s nothing left to be done (Quarry 1, Quarry 2).