Page MenuHomePhabricator

Populate the page_props table on Wikidata with wb-identifiers
Closed, ResolvedPublic

Description

In T114617 the wb-identifiers were introduced in the page_props table. This is probably going to be deployed somewhere in May if all goes well. When an items gets edited, the page_props table will be updated. This still leaves us with a lot of items without the page_prop set for a long time. These items should be purged in batches, but not too fast to not to overload the infrastructure. This will make the numbers available in SQL, but not in SPARQL, for that we have T144476

In the past I did this with a query at https://tools.wmflabs.org/multichill/queries/wikidata/no_pageprops.sql (already updated with the new property) and a (pywiki)bot working on this in the background. Could do this again.

Event Timeline

That makes sense to me. Thank you! :)
CCing @daniel for sanity check

Makes sense, and should be fine.

However, purging all these pages is quite expensive. We could calculate the number and write it to page_props directly. That's a little bit hackish, but a lot faster, and not too horrible. It does needs more code to be written, but that shouldn't take more than a day or two. Worth it? I don't know.

Makes sense, and should be fine.

However, purging all these pages is quite expensive. We could calculate the number and write it to page_props directly. That's a little bit hackish, but a lot faster, and not too horrible. It does needs more code to be written, but that shouldn't take more than a day or two. Worth it? I don't know.

Is it worth the effort? Anyone available to actually do this? Sounds nice, but if it means waiting for many months because this has no priority, I rather just run a purge bot.

Yeah in that case let's just run the bot.

@Multichill You can go ahead with the bot run. The feature is deployed.

Given codfw database being under pressure right now, I suggest starting the process tomorrow. Especially if it's going to be intense purging. If the speed is low or you are not doing it in peak time, I'd say go ahead.

I haven't done anything yet, but quite a few pages already have the page_prop:

MariaDB [wikidatawiki_p]> SELECT COUNT(*) FROM page_props WHERE pp_propname='wb-identifiers' LIMIT 1;
+----------+

COUNT(*)

+----------+

673659

+----------+
1 row in set (0.66 sec)

Some heavy job running that is updating a lot of pages? I'll take it easy with the run.

The bot is slow purging the items. We're now around the 7 million done. No load issues or any other problems detected. I could speed up, but right now it's stable and I don't want to risk any issues.

I'm pretty sure this is done. Please re-open if that's not the case