Page MenuHomePhabricator

removeLanguageLinks doesnt always remove obsolete codes
Closed, ResolvedPublic

Description

There is a very unusual scenario where textlib.removeLanguageLinks will not remove obsolete interlanguage links.

removeLanguageLinks includes:

    if not site.validLanguageLinks():
        return text
...

    languages = '|'.join(site.validLanguageLinks() +
                         list(site.family.obsolete.keys()))

As it exits early if site.validLanguageLinks() doesnt return a value, the current algorithm for site.validLanguageLinks() is:

nsnames = [name for name in self.namespaces.values()]
return [lang for lang in self.languages()
        if first_upper(lang) not in nsnames]

The scenario which does not work is:

  1. A wiki family called foo has two codes: 'en', and 'de' , with the interwikimap set up so that en is the English language site and de is the German language site, linked together.
  2. The wiki community decides two wikis is duplicated effort, so they merge the wikis into a multilingual wiki. The de site is retired, all content is copied to the en site, and the sysadmin deletes de from the interwikimap of en.
  3. A bot operator creates a Family mapping obsolete with de and en, and puts in langs new site foo, because it is multilingual site.
  4. The bots running on foo are supposed to remove all 'de' interlanguage links from the foo (was en) site. This does not happen, because 'Foo' is the name of the Project namespace.

The workaround to this is the Family needs to use a non-namespace code (e.g. en ), until the de interlanguage links are removed, and then change the code from en to foo after all internal language links to de have been removed.

Or, we can fix the bug ;-)

Event Timeline

jayvdb raised the priority of this task from to Needs Triage.
jayvdb updated the task description. (Show Details)
jayvdb added subscribers: jayvdb, XZise, Malafaya.

Change 237612 had a related patch set uploaded (by Malafaya):
Cache validLanguageLinks in removeLanguageLinks()

https://gerrit.wikimedia.org/r/237612

Change 237612 merged by jenkins-bot:
Build lang prefixes list once in removeLanguageLinks()

https://gerrit.wikimedia.org/r/237612

jayvdb assigned this task to Malafaya.
jayvdb triaged this task as Low priority.
jayvdb removed a project: Patch-For-Review.
jayvdb set Security to None.