The following is an analytical query that we used to run every week on fawiki but now it takes too long (more than 30 minutes) and I need help optimizing the query. The query aims to identify pages in which there exists a link to a non-existing file (nearly always, an image). Obviously, the query should exclude links to files that don't exist locally but are present on Commons.
Adding Bryan and MA per previous discussion on Cloud-l in which they suggested that bringing these queries to their attention could potentially help with creation of new indexes that could benefit many more users.
SELECT page_title, il_to FROM page JOIN imagelinks ON il_from = page_id WHERE page_namespace = 0 AND il_to NOT IN ( SELECT page_title FROM page WHERE page_namespace = 6 ) AND il_to NOT IN ( SELECT page_title FROM commonswiki_p.page WHERE page_namespace = 6 ) LIMIT 10000