Page MenuHomePhabricator

Update configuration for ('ru','ru')
Closed, ResolvedPublic

Description

Hi! The database of Russian monuments needs to be modified. The country "ru-old" is no longer needed. These lists are obsolete, and I am not sure that they even exist. The rules for the country "ru" are outdated as well.

Up-to-date Russian lists are located at https://ru.wikivoyage.org/wiki/Культурное наследие России/ (most of the lists are subpages). The lists are organized using the {{monument}} template. If necessary, I can provide the mapping of template's parameters onto the fields that the bot reads.

Fine print: this year we also have the lists of Crimean monuments that are organized as subpages of https://ru.wikivoyage.org/wiki/Культурное наследие/
The country name is omitted in the pagetitle in order to avoid the delicate issue of which state Crimea belongs to. These lists could be uploaded into the database as well, although they are mostly repeating Ukrainian lists, so there will be lots of overlaps. Skipping the Crimean lists for the time being can also be a solution.

Event Timeline

JeanFred claimed this task.
JeanFred raised the priority of this task from to Needs Triage.
JeanFred updated the task description. (Show Details)
JeanFred added subscribers: JeanFred, Atsirlin.

Is there any chance to have this done before WLM starts? We would like to add a link to the map on our landing page, but it makes no sense when Russian monuments are not in the database.

Another request to the ErfgoedBot maintainers is to disable any categorization for ru. We have our own tools developed last year, and it's better if ErfgoedBot simply does not interfere.

So to some this up. http://git.wikimedia.org/blob/labs%2Ftools%2Fheritage.git/5425dfd13e821d0d85577bd0c6cc52355b964674/erfgoedbot%2Fmonuments_config.py#L6838 can just removed without looking back.

http://git.wikimedia.org/blob/labs%2Ftools%2Fheritage.git/5425dfd13e821d0d85577bd0c6cc52355b964674/erfgoedbot%2Fmonuments_config.py#L6903 needs to be updated.

Can you just copy, update and paste the top part here so we know how to configure it?

('ru', 'ru') : { # Monuments in Russia in Russian.
    'project' : u'wikipedia',
    'lang' : u'ru',
    'headerTemplate' : u'WLM/заголовок',
    'rowTemplate' : u'WLM/строка',
    'commonsTemplate' : u'Cultural Heritage Russia',
    'commonsTrackerCategory' : u'Cultural heritage monuments in Russia with known IDs',
    'commonsCategoryBase' : u'Cultural heritage monuments in Russia',
    'autoGeocode' : False,
    'unusedImagesPage' : u'Проект:Вики любит памятники/Unused images of cultural heritage monuments',
    'imagesWithoutIdPage' : u'Проект:Вики любит памятники/Images of Cultural heritage monuments without ID',
    'registrantUrlBase' : u'http://kulturnoe-nasledie.ru/monuments.php?id=%s',
    'namespaces' : [0, 104],
    'table' : u'monuments_ru_(ru)',
    'truncate' : False,
    'primkey' : u'id',

If any fields changed, please include that too. To disable categorization we can set commonsTemplate to False. Should probably file a bug to do this in a more elegant way because removing that template also kills the indexing of images.

Thank you, Maarten! The updated part of the code for ru is below. I am not sure what 'primkey' means. Perhaps it should be changed from 'id' to 'knid' if that's the name of the monument ID on our side.

Regarding the categorization, removing 'commonsTemplate' may be the easiest thing to do because we want neither categorization nor lists of unused images that tend to grow extremely long and, thus, useless.

('ru', 'ru') : { # Monuments in Russia in Russian.

'project' : u'wikivoyage',
'lang' : u'ru',
'headerTemplate' : u'monument-title',
'rowTemplate' : u'monument',
'commonsTemplate' : u'Cultural Heritage Russia',
'commonsTrackerCategory' : u'Cultural heritage monuments in Russia with known IDs',
'commonsCategoryBase' : u'Cultural heritage monuments in Russia',
'autoGeocode' : False,
'unusedImagesPage' : u'Культурное наследие России/Неиспользуемые изображения',
'imagesWithoutIdPage' : u'Культурное наследие России/Проблемные изображения',
'registrantUrlBase' : u'http://kulturnoe-nasledie.ru/monuments.php?id=%s',
'namespaces' : [0, 104],
'table' : u'monuments_ru_(ru)',
'truncate' : False,
'primkey' : u'id',
    'fields' : [
        {
            'source' : u'knid',
            'dest' : u'id',
            'type' : 'varchar(15)',
            'default' : '0',
        },
        {
            'source' : u'name',
            'dest' : u'name',
        },
        {
            'source' : u'region',
            'dest' : u'region',
        },
        {
            'source' : u'region',
            'dest' : u'region_iso',
        },
        {
            'source' : u'district',
            'dest' : u'district',
        },
        {
            'source' : u'municipality',
            'dest' : u'city',
        },
        {
            'source' : u'address',
            'dest' : u'address',
        },
        {
            'source' : u'lat',
            'dest' : u'lat',
        },
        {
            'source' : u'long',
            'dest' : u'lon',
        },
        {
            'source' : u'image',
            'dest' : u'image',
        },
        {
            'source' : u'commonscat',
            'dest' : u'commonscat',
        },
        {
            'source' : u'wiki',
            'dest' : u'monument_article',
        },
        {
            'source' : u'knid',
            'dest' : u'registrant_url',
            'conv' : u'generateRegistrantUrl',
        },
    ],
},

https://ru.wikipedia.org/wiki/%D0%A8%D0%B0%D0%B1%D0%BB%D0%BE%D0%BD:Monument does not exist and neither does https://ru.wikipedia.org/wiki/%D0%A8%D0%B0%D0%B1%D0%BB%D0%BE%D0%BD:Monument-title . Please provide the correct templates. I see all the fields names you provided are in English, I doubt that the Russian wikipdia would be using English, that would be weird. More info at https://commons.wikimedia.org/wiki/Commons:Monuments_database/Harvesting

The primkey is the primary key (https://en.wikipedia.org/wiki/Unique_key) that's unique for all the monuments.

Maarten, the lists are on Russian Wikivoyage, not on Wikipedia. Note that I have changed the 'project' field in the very beginning of the code.

All parameter names are in English indeed. It is intentional.

Change 235432 had a related patch set uploaded (by Jean-Frédéric):
Update ru configuration

https://gerrit.wikimedia.org/r/235432

Change 235432 merged by Jean-Frédéric:
Update ru configuration

https://gerrit.wikimedia.org/r/235432

Change 235438 had a related patch set uploaded (by Jean-Frédéric):
Template names in monuments-config are case-sensentive

https://gerrit.wikimedia.org/r/235438

Change 235438 merged by Jean-Frédéric:
Template names in monuments-config are case-sensentive

https://gerrit.wikimedia.org/r/235438

Configuration updated. Harvesting is in progress.

Harvesting done and table updated. @Atsirlin, can you confirm that all is good with Russian monuments in the database ?

@JeanFred, thank you! There are two issues:

  • On your side: something is wrong with the character encoding for pagenames. When you click on the monument ID, it does not forward you to the actual list of monuments because the pagename is written in odd characters. It looks like win1251 instead of utf8, or something like that. Could you check it please? All other fields are OK.
  • On our side, both cultural heritage (WLM) and natural (WLE) monument lists are using the same template, so the natural monuments were added into the database too. I will rename the template for natural monuments. I should have done it earlier.
  • On your side: something is wrong with the character encoding for pagenames. When you click on the monument ID, it does not forward you to the actual list of monuments because the pagename is written in odd characters. It looks like win1251 instead of utf8, or something like that. Could you check it please? All other fields are OK.

Quite annoying indeed. Other countries with non-latin scripts (like Arabic) are probably affected too. I’ll have a look as soon as possible − tracked at T111526

  • On our side, both cultural heritage (WLM) and natural (WLE) monument lists are using the same template, so the natural monuments were added into the database too. I will rename the template for natural monuments. I should have done it earlier.

Right, that makes sense. Thanks for doing it! (In the future we might want to include natural monuments for WLE in the monuments database).

JeanFred triaged this task as High priority.Sep 4 2015, 2:11 PM
JeanFred removed a project: Patch-For-Review.
JeanFred set Security to None.