Page MenuHomePhabricator

renaming a file does not clear location data for original file in db-table geo_tags
Closed, ResolvedPublic

Description

Redirecting a file with location data, e.g.

https://commons.wikimedia.org/w/index.php?title=File:Berlin,_Germany_-_panoramio.jpg&redirect=no

leaves two entries in the geo_tags table. See the entries for the above file and its redirect in geo_tags

use commonswiki_p;
select * from page where page_id in (55516538,54304734);
select * from geo_tags where gt_page_id in (55516538,54304734);

(It is ok to find two entries in the page table, one marked as the redirect)
For the redirect, there are no location data in the redirect any more, but any app using geo_tags will find it still. Especially in case of wrong location data, it is not clear where to fix for the redirect.

I tried to add coordinates to a redirect and remove them again, and this will clear the entry for the redirect in geo_tags. Purging didn't.

What has to be done:

  • any rename should remove the (old) file with its location data from db-table geo_tags
  • a purge should also remove any remainders from the geo_tags table
  • do necessary cleanup: remove location data for redirects left in geo_tags, if there are no longer location data in the redirect.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

There are ~130600 redirects with location data in geo_tags (for commons file namespace):

use commonswiki_p;
SELECT  count(*) 

FROM geo_tags AS g1 JOIN page AS p1 ON g1.gt_page_id = p1.page_id
WHERE p1.page_namespace = 6
AND p1.page_is_redirect = 1
;

Similar data do exist in the main namespace of e.g.

  • enwiki_p: 190 matches
  • dewiki_p: 4300 matches
CBogen triaged this task as High priority.Dec 7 2020, 4:24 PM
CBogen moved this task from needs triage to Geodata on the Discovery-Search board.
Gehel lowered the priority of this task from High to Medium.Jan 21 2021, 3:25 PM

GeoData does not use the name of a page to store its data. It is using the page_id.
The page_id is stable accross moves/renames.

For files GeoData also extract the coordinates from Exif data. To get the metadata of a file it is searching it by name and that search resolved the redirect and gives coordinate for the redirect.

GeoData also searching for files accross all repos, that means for file description of shared files on a wikipedia it would also get the coordinates stored in the database for the commons file.

Change 742526 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/extensions/GeoData@master] Add coordinates only from local files, which are not redirects

https://gerrit.wikimedia.org/r/742526

TheDJ assigned this task to Umherirrender.

Change 742526 merged by jenkins-bot:

[mediawiki/extensions/GeoData@master] Add coordinates only from local files, which are not redirects

https://gerrit.wikimedia.org/r/742526