Page MenuHomePhabricator

renaming a file does not clear location data for original file in db-table geo_tags
Open, MediumPublic

Description

Redirecting a file with location data, e.g.

https://commons.wikimedia.org/w/index.php?title=File:Berlin,_Germany_-_panoramio.jpg&redirect=no

leaves two entries in the geo_tags table. See the entries for the above file and its redirect in geo_tags

use commonswiki_p;
select * from page where page_id in (55516538,54304734);
select * from geo_tags where gt_page_id in (55516538,54304734);

(It is ok to find two entries in the page table, one marked as the redirect)
For the redirect, there are no location data in the redirect any more, but any app using geo_tags will find it still. Especially in case of wrong location data, it is not clear where to fix for the redirect.

I tried to add coordinates to a redirect and remove them again, and this will clear the entry for the redirect in geo_tags. Purging didn't.

What has to be done:

  • any rename should remove the (old) file with its location data from db-table geo_tags
  • a purge should also remove any remainders from the geo_tags table
  • do necessary cleanup: remove location data for redirects left in geo_tags, if there are no longer location data in the redirect.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

There are ~130600 redirects with location data in geo_tags (for commons file namespace):

use commonswiki_p;
SELECT  count(*) 

FROM geo_tags AS g1 JOIN page AS p1 ON g1.gt_page_id = p1.page_id
WHERE p1.page_namespace = 6
AND p1.page_is_redirect = 1
;

Similar data do exist in the main namespace of e.g.

  • enwiki_p: 190 matches
  • dewiki_p: 4300 matches
CBogen triaged this task as High priority.Dec 7 2020, 4:24 PM
CBogen moved this task from needs triage to Geodata on the Discovery-Search board.
Gehel lowered the priority of this task from High to Medium.Jan 21 2021, 3:25 PM