Page MenuHomePhabricator

Investigate why tracker template is not detected on Austrian images
Open, Needs TriagePublic

Description

From :c:Commons_talk:Monuments_database/Images_without_id

Images without id reports Hall_in_Tirol,_Haus_Unterer_Stadtplatz_8a.jpg as not having an id template despite it having one.

Event Timeline

Running the queries manually in quarry to see where the image shows up and where it doesn't.

  • getMonumentsWithTemplate -- contains Hall in Tirol, Haus Unterer Stadtplatz 8a.jpg
  • getMonumentsWithoutTemplate -- empty dataset because the base category is unpopulated and subcategories are not of the form Cultural heritage monuments in Austria in %.
  • Proxying getMonumentsWithPhoto by running SELECT image, id FROM monuments_all WHERE NOT image='' AND country="at" AND lang="de" AND id=44715; from toolforge (because quarry does not have access) -- contains Hall in Tirol, Haus Unterer Stadtplatz 8a.jpg

Running the script in docker for at_de (after first harvesting):
getMonumentsWithoutTemplate output contains Hall_in_Tirol,_Haus_Unterer_Stadtplatz_8a.jpg
getMonumentsWithPhoto output does not contains Hall_in_Tirol,_Haus_Unterer_Stadtplatz_8a.jpg

Diging into this a bit more the monument database contains Hall_in_Tirol,_Haus_Unterer_Stadtpatz_8a.JPG wheras the file is called Hall_in_Tirol,_Haus_Unterer_Stadtplatz_8a.jpg on Commons.

So the cause is a redirect in this case. The good solution would be to resolve this on harvest, but that is likely expensive.

Looking at the others the majority seem to be redirects or local uploads (some of which also exist on Commons. There is also at least one case of an image being in the monuments database with url encoded characters.

I have the following in my css to make redirects stand out

/* omdirigeringslänkar */
.mw-redirect { color: #008000; }
.mw-redirect:visited { color: #006400; }

Redirect and local images could be tested as part of images_without_id. Not sure it's the most suitable place though. @JeanFred any thoughts.

addCommonsTemplate already checks for page existence so resolving redirects and reporting back existence should be doable without any significant overhead.