Page MenuHomePhabricator

Provide a mechanism for accessing the names of image files on Commons when querying another wiki
Open, MediumPublic

Description

MediaWiki allows you to use images from two sources: the local wiki, or a shared wiki. In the case of WMF wikis, Commons is the shared wiki.

If you wanted to identify all places where a *non-existent* file is used in a [[File: or [[Image tag, you need to join the imagelinks table not just to your page table, but also to the page table on Commons. Currently, we can do this using Replicas. This query may fail once the changes to Replicas are implemented, as it won't be possible to join tables cross-wiki as is done here. (see https://lists.wikimedia.org/pipermail/cloud/2020-November/001290.html and https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign)

Here is an example of the query we currently use. We would need an alternative solution to be set up before the Replica Redesign goes into effect.

SELECT
  page_title,
  il_to
FROM page
JOIN imagelinks
  ON il_from = page_id
WHERE
  page_namespace = 0
  AND il_to NOT IN (
    SELECT page_title
    FROM page
    WHERE
    page_namespace = 6
  )
  AND il_to NOT IN (
    SELECT page_title
    FROM commonswiki_p.page
    WHERE page_namespace = 6
  )
LIMIT 1000

Event Timeline

Andrew triaged this task as Medium priority.Dec 8 2020, 5:27 PM
Andrew moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.