Page MenuHomePhabricator

A lot of wikisource urls are redirected to the "mul" site
Open, LowPublic

Description

The following error exceedes due to urls of interwikimap wich does not have the given corresponding site. They are redirected to wikisource.org instead.

ERROR: test_attributes_after_run (tests.generate_family_files_tests.TestGenerateFamilyFiles)

Test FamilyFileGenerator attributes after run().

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/home/travis/build/wikimedia/pywikibot/tests/utils.py", line 93, in wrapper

    func(*args, **kwargs)

  File "/home/travis/build/wikimedia/pywikibot/tests/generate_family_files_tests.py", line 65, in test_attributes_after_run

    site = Site(url=lang['url'])

  File "/home/travis/build/wikimedia/pywikibot/pywikibot/__init__.py", line 1229, in Site

    code, fam = _code_fam_from_url(url)

  File "/home/travis/build/wikimedia/pywikibot/pywikibot/__init__.py", line 1186, in _code_fam_from_url

    raise SiteDefinitionError("Unknown URL '{0}'.".format(url))

pywikibot.exceptions.SiteDefinitionError: Unknown URL 'https://gd.wikisource.org/wiki/$1'.

See also:

Event Timeline

So what is the best solution? Do we need to accept this redirect in our tests?

So what is the best solution? Do we need to accept this redirect in our tests?

Probably a new list in Family file to indicate this redirection

Change 563714 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [tests] Skip SiteDefinitionError for wikisource urls

https://gerrit.wikimedia.org/r/563714

I think https://gerrit.wikimedia.org/r/563714 will help passing tests until we have implemented the redirection list inside the Family files

To understand the wanted behaviour, what is the wanted Site for a redirected 'code'?
Site('aa', 'wikisource') = ?
E.g. APISite("mul", "wikisource")?

What is the wanted tuple (site.lang, site.code)?
For 'mul', today it is ('en', 'mul').

To understand the wanted behaviour, what is the wanted Site for a redirected 'code'?
Site('aa', 'wikisource') = ?
E.g. APISite("mul", "wikisource")?

What is the wanted tuple (site.lang, site.code)?
For 'mul', today it is ('en', 'mul').

Currently the test given above fails after a lot of of domain were added to siteinfo where the domain does not exists like aa.wikisource.org:

# test an existing site
>>> import pwb, pywikibot as py
>>> s = py.Site('de', 'wikisource')
>>> s
APISite("de", "wikisource")
>>> s.hostname()
'de.wikisource.org'

# check 'mul'
>>> s = py.Site('mul', 'wikisource')
>>> s
APISite("mul", "wikisource")
>>> s.hostname()
'wikisource.org'

# '-' is redirected to 'mul' but gives a Userwarning
>>> s = py.Site('-', 'wikisource')
WARNING: <pyshell#16>:1: UserWarning: Site wikisource:mul instantiated using different code "-"

>>> s
APISite("mul", "wikisource")
>>> s.hostname()
'wikisource.org'

### 'al' fails but redirection like in '-' should be expected ###
>>> s = py.Site('al', 'wikisource')
Traceback (most recent call last):
  File "<pyshell#22>", line 1, in <module>
    s = py.Site('al', 'wikisource')
  File "C:\pwb\GIT\core\pywikibot\__init__.py", line 1271, in Site
    _sites[key] = interface(code=code, fam=fam, user=user, sysop=sysop)
  File "C:\pwb\GIT\core\pywikibot\site.py", line 1832, in __init__
    BaseSite.__init__(self, code, fam, user, sysop)
  File "C:\pwb\GIT\core\pywikibot\site.py", line 775, in __init__
    raise UnknownSite("Language '%s' does not exist in family %s"
pywikibot.exceptions.UnknownSite: Language 'al' does not exist in family wikisource

Didn't make deeper thoughts about the implementation. Just proposed to fix the test that it do not fail any longer (make Travis run again) because the problem is known and tagged here to be solved finally (or in several steps if necessary).

Change 563714 merged by jenkins-bot:
[pywikibot/core@master] [tests] Skip redirected urls in generate_family_files_tests

https://gerrit.wikimedia.org/r/563714

Xqt lowered the priority of this task from High to Low.Jan 21 2020, 5:40 AM

Okay, I investigated this and it seems code_aliases or interwiki_replacements are shared between wikimedia_family class families. This is bad, because if I add specific code_aliases to a wikimedia_family class family, it is shared between all of them. Therefore specific code_aliases for Wikisource makes non-existing language codes to be redirected to mul even for Wikipedia! (see https://travis-ci.org/dvorapa/pywikibot/jobs/654053005 for more details)

Weird enough this happens only in tests, so there is some issue with aspects.py

Change 573992 had a related patch set uploaded (by Dvorapa; owner: Dvorapa):
[pywikibot/core@master] [bugfix] Fix www Wikisource aliases

https://gerrit.wikimedia.org/r/573992

Thanks for your patch. The other way finding a site with ist url still does not work:

>>> s = py.Site(url='https://mul.wikisource.org/wiki/')
Traceback (most recent call last):
  File "<pyshell#41>", line 1, in <module>
    s = py.Site(url='https://mul.wikisource.org/wiki/')
  File "C:\pwb\GIT\core\pywikibot\tools\__init__.py", line 1790, in wrapper
    return obj(*__args, **__kw)
  File "C:\pwb\GIT\core\pywikibot\__init__.py", line 1235, in Site
    code, fam = _code_fam_from_url(url)
  File "C:\pwb\GIT\core\pywikibot\__init__.py", line 1189, in _code_fam_from_url
    raise SiteDefinitionError("Unknown URL '{0}'.".format(url))
pywikibot.exceptions.SiteDefinitionError: Unknown URL 'https://mul.wikisource.org/wiki/'.

>>> s = py.Site(url='https://lbe.wikisource.org/wiki/')
Traceback (most recent call last):
  File "<pyshell#44>", line 1, in <module>
    s = py.Site(url='https://lbe.wikisource.org/wiki/')
  File "C:\pwb\GIT\core\pywikibot\tools\__init__.py", line 1790, in wrapper
    return obj(*__args, **__kw)
  File "C:\pwb\GIT\core\pywikibot\__init__.py", line 1235, in Site
    code, fam = _code_fam_from_url(url)
  File "C:\pwb\GIT\core\pywikibot\__init__.py", line 1189, in _code_fam_from_url
    raise SiteDefinitionError("Unknown URL '{0}'.".format(url))
pywikibot.exceptions.SiteDefinitionError: Unknown URL 'https://lbe.wikisource.org/wiki/'.

but

>>> s = py.Site(url='https://wikisource.org/wiki/')
>>> s
APISite("mul", "wikisource")

Change 573992 merged by jenkins-bot:
[pywikibot/core@master] [bugfix] Fix mul Wikisource aliases

https://gerrit.wikimedia.org/r/573992

Okay, there is still an issue with Site(url=redir), but at least we do have all redirs added to family.

There are multiple possible solutions. I can think of two at least:

  • add code_aliases to family.langs (this would mean many redirected urls in family.langs which seems wrong to me)
  • create separate redirect domain list like family.redirect_domains or family.redirect_langs