Page MenuHomePhabricator

Weird harvest of by_(be-tarask)
Closed, ResolvedPublic

Description

  1. It looks like the harvesting bot only accesses the two pages in this category: monuments in Minsk, and ignores all the rest (all the pages within these subcategories for the different administrative units ). The statistics show only 2 source pages are used.
  2. rajon and oblast-iso in monuments_by_(be-tarask) are always empty, so the coverage of adm1 is at 0.0% according to the stats as well.

Event Timeline

Looks like we've set the harvester to only do namespace 4 (Wikipedia).
Based on this and this the lists were moved to the main namespace in January. (All except the Minsk ones).

Change 373053 had a related patch set uploaded (by Lokal Profil; owner: Lokal Profil):
[labs/tools/heritage@master] Allow by_be-tarask lists to be in the main namespace

https://gerrit.wikimedia.org/r/373053

Taking a look at the templates it doesn't look as though rajon or oblast-iso were ever there and oblast-iso is mapped to "vobłaść-iso" which is in a different alphabet so I'm guessing it's a copy-paste error. removing them

This is really weird because you'd assume these two parameters would have come from the header template (Вікі любіць славутасьці/Вяршыня сьпісу), but I did not find a single instance where the header template is used with any parameters, or any trace of them in the history... Meaning they were never supported. In which case, it's really unclear where the rajon and vobłaść-iso even came from in the first place.

Change 373053 merged by jenkins-bot:
[labs/tools/heritage@master] Allow by_be-tarask lists to be in the main namespace

https://gerrit.wikimedia.org/r/373053

Mentioned in SAL (#wikimedia-cloud) [2017-08-22T12:16:21Z] <Lokal_Profil> Deploy latest from Git master: 166f01d, 1d33262, 7177386 (T173717)

Lets wait for a new harvest and if that looks ok then this can be closed.

125 source pages now harvested

Looks like the administrative columns oblast-iso / rajon have been removed, leaving the table without any sort of administrative field whatsoever.

Another issue is the coordinates column, containing templated values like {{Каардынаты|54|06|51.83|паўночнае|25|34|50.12|усходняе|region:BY|выяўленьне=тэкст}} (or in another format, {{каардынаты|52.208333343333|25.556944454444|выяўленьне=тэкст}}) that have not been translated to lat / lon columns. In fact it looks those two are always empty.

Probably worth pinging someone who works to the actual lists to figure this one out

Another issue is the coordinates column, containing templated values like {{Каардынаты|54|06|51.83|паўночнае|25|34|50.12|усходняе|region:BY|выяўленьне=тэкст}} (or in another format, {{каардынаты|52.208333343333|25.556944454444|выяўленьне=тэкст}}) that have not been translated to lat / lon columns. In fact it looks those two are always empty.

Its the same situation for the Philippines. I added a patch (not merged yet) to at least harvest these templates to the local table and to drop the unused lat/lin tables.