Page MenuHomePhabricator

Update query for finding categories on Commons with Wikidata links but no Wikidata Infobox
Closed, ResolvedPublic

Description

Until 10 September, Pi bot was happily running this query on toolforge to find cases of categories on Commons that are connected to Wikidata, but don't yet use Wikidata Infobox:

SELECT DISTINCT pp.pp_value, p1.page_title, tl.tl_namespace, tl.tl_title
 FROM categorylinks AS c1
 JOIN page AS p1 ON c1.cl_from=p1.page_id AND p1.page_namespace=14 AND p1.page_is_redirect=0
 JOIN page_props AS pp ON pp.pp_page = p1.page_id AND pp.pp_propname = 'wikibase_item'
 LEFT JOIN templatelinks AS tl ON tl.tl_from = p1.page_id AND tl.tl_from_namespace = 14 AND tl.tl_namespace = 10 AND tl.tl_title = 'Wikidata_Infobox' 
 WHERE tl.tl_title IS NULL

However, this no longer works due to the database changes with 1.39 that removed several columns from the database. Can someone suggest a replacement please?

Event Timeline

This should be equivalent, I think:

SELECT DISTINCT pp.pp_value, p1.page_title
 FROM categorylinks AS c1
 JOIN page AS p1 ON c1.cl_from=p1.page_id AND p1.page_namespace=14 AND p1.page_is_redirect=0
 JOIN page_props AS pp ON pp.pp_page = p1.page_id AND pp.pp_propname = 'wikibase_item'
 LEFT JOIN templatelinks AS tl ON tl.tl_from = p1.page_id AND tl.tl_from_namespace = 14 AND tl_target_id = (SELECT lt_id FROM linktarget WHERE lt_namespace = 10 AND lt_title = 'Wikidata_Infobox')
 WHERE tl.tl_target_id IS NULL

(I removed the templatelinks columns from the SELECT, because they would’ve been NULL anyways, right?)

taavi claimed this task.
taavi subscribed.

The change is explained in this thread: https://lists.wikimedia.org/hyperkitty/list/cloud@lists.wikimedia.org/thread/U2U6TXIBABU3KDCVUOITIGI5OJ4COBSW/. tl;dr is that tl_title and tl_namespace no longer exist, instead there's a tl_target_id field which can be used to join on the linktarget table for the same data.

In the future, please use the mailing lists or IRC channels for these kinds of questions, as Phabricator is a task tracker and not a general support forum - thanks!

Or if you don't want to use templatelinks, you can also switch to categorylinks:

SELECT DISTINCT pp.pp_value, p1.page_title
FROM categorylinks AS c1
JOIN page AS p1 ON c1.cl_from=p1.page_id AND p1.page_namespace=14 AND p1.page_is_redirect=0
JOIN page_props AS pp ON pp.pp_page = p1.page_id AND pp.pp_propname = 'wikibase_item'
LEFT JOIN categorylinks AS cl ON cl.cl_from = p1.page_id AND cl.cl_type='subcat' 
AND (cl.cl_to='Uses_of_Wikidata_Infobox' or cl.cl_to='Uses_of_Wikidata_Infobox_with_no_item' or cl.cl_to='Category_redirects' or cl.cl_to='Disambiguation_categories')
WHERE cl.cl_to IS NULL
LIMIT 10;

This also gives you the option to already filter out types of categories you don't want to work on anyway.

I don't agree with Taavi. Mike's tool broke, so he filed a task in Phabricator to track that and get it fixed. His is tool is still broken so setting the task to resolved is a bit strange.

Thanks for the replies! I've gone with Lucas's solution, since that's probably more stable than the categories, but that's also an interesting approach to take! The query ran successfully, and Pi bot's adding infoboxes again now.

I don't agree with Taavi. Mike's tool broke, so he filed a task in Phabricator to track that and get it fixed. His is tool is still broken so setting the task to resolved is a bit strange.

As far as I can tell, Pi bot issues are not tracked in Phabricator in general, or at least I didn't find a relevant project for it. I closed it because I don't want a lingering open task on the infrastructure project that is not actually a problem with the infrastructure.