Page MenuHomePhabricator

Unable to communicate with obsolete / non-existing wikis
Open, MediumPublic

Description

The za.wiktionary has been closed and is now on the obsolete list of the wiktionary, but not on the langs list. So when a Site object is created for that site it can't communicate with the server:

D:\Py\rewrite>pwb.py interwiki -family:wiktionary -cleanup -start:category:Finština -array:50 -query:30 -untranslated

NOTE: Number of pages queued is 0, trying to add 30 more.
Retrieving 30 pages from wiktionary:cs.
ERROR: Traceback (most recent call last):
  File "D:\Py\rewrite\pywikibot\data\api.py", line 584, in submit
    headers=headers, body=body)
  File "D:\Py\rewrite\pywikibot\tools.py", line 549, in wrapper
    return obj(*__args, **__kw)
  File "D:\Py\rewrite\pywikibot\comms\http.py", line 232, in request
    host = site.ssl_hostname()
  File "D:\Py\rewrite\pywikibot\site.py", line 550, in <lambda>
    f = lambda *args, **kwargs: method(self.code, *args, **kwargs)
  File "D:\Py\rewrite\pywikibot\family.py", line 994, in ssl_hostname
    return self.hostname(code)
  File "D:\Py\rewrite\pywikibot\family.py", line 990, in hostname
    return self.langs[code]
KeyError: u'za'

WARNING: Waiting 5 seconds before retrying.
...
WARNING: Waiting 10 seconds before retrying.

See Also:
T73115: Mark Zhuang wiktionary (za) as obsolete

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:44 AM
bzimport set Reference to bz72674.
bzimport added a subscriber: Unknown Object (????).

I found connection with https://bugzilla.wikimedia.org/show_bug.cgi?id=71115

When is in some page present old interwiki to za:, bot have this problem and cannot continue

after https://cs.wiktionary.org/w/index.php?title=Kategorie:Francouzština&diff=prev&oldid=531361 this edit bot works as expected

Similar problem with language on incubator

(In reply to JAn Dudík from comment #2)

Similar problem with language on incubator

I mean when is in page interwiki to non existing wiktionary ([[wikt:bar:Category:Foo]] does not exist neither allone nor in incubator) bot crashes

example:
https://csb.wiktionary.org/w/index.php?title=Kategòrëjô:Jãzëczi&curid=1825&diff=28265&oldid=28259

...
Retrieving 30 pages from wiktionary:de.
Dump cs (wiktionary) written.
Traceback (most recent call last):

File "D:\Py\rewrite\pwb.py", line 178, in <module>
  run_python_file(fn, argv, argvu)
File "D:\Py\rewrite\pwb.py", line 75, in run_python_file
  exec(compile(source, filename, "exec"), main_mod.__dict__)
File "D:\Py\rewrite\scripts\interwiki.py", line 2646, in <module>
  main()
File "D:\Py\rewrite\scripts\interwiki.py", line 2621, in main
  bot.run()
File "D:\Py\rewrite\scripts\interwiki.py", line 2365, in run
  self.queryStep()
File "D:\Py\rewrite\scripts\interwiki.py", line 2338, in queryStep
  self.oneQuery()
File "D:\Py\rewrite\scripts\interwiki.py", line 2328, in oneQuery
  for page in gen:
File "D:\Py\rewrite\pywikibot\site.py", line 2430, in preloadpages
  api.update_page(page, pagedata, rvgen.props)
File "D:\Py\rewrite\pywikibot\data\api.py", line 1480, in update_page
  source=page.site)
File "D:\Py\rewrite\pywikibot\page.py", line 4371, in langlinkUnsafe
  link._site = pywikibot.Site(lang, source.family.name)
File "D:\Py\rewrite\pywikibot\__init__.py", line 573, in Site
  _sites[key] = interface(code=code, fam=fam, user=user, sysop=sysop)
File "D:\Py\rewrite\pywikibot\site.py", line 1399, in __init__
  BaseSite.__init__(self, code, fam, user, sysop)
File "D:\Py\rewrite\pywikibot\site.py", line 439, in __init__
  % (self.__code, self.__family.name))

pywikibot.exceptions.UnknownSite: Language ht does not exist in family wiktionary
<class 'pywikibot.exceptions.UnknownSite'>
CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort

Bot should ignore <s>or remove</s> such link (ht:wiktionary exists on incubator)

What does the link look like in the incubator?

(In reply to Fabian from comment #5)

What does the link look like in the incubator?

[[wikt:ht:Kategori:Lang]] -> [[incubator:Wt/ht/Kategori:Lang]]

https://de.wiktionary.org/w/index.php?title=Kategorie:Sprachen&diff=prev&oldid=4000105

Ah interesting. wikt:ht:X in the English Wikipedia is the same as ht:X in the English Wiktionary and while doesn't show a 'ht' interwiki prefix, the api does:

http://en.wiktionary.org/w/api.php?action=query&meta=siteinfo&siprop=interwikimap

If you search for '"ht"' (with the double quotes) you find an entry which links to ht.wiktionary.org which itself redirects to the incubator.

When pwb is now analysing the link it thinks 'ht.wiktionary.org' is a wiktionary although it's on the incubator. So somehow our families system need to identify this because I don't know if a script could figure it out on it's own. So similar to the current obsolete system where one old language is referenced to a new one, the new system not only needs to reference to a new language but new site.

What we can't do is using the 'ht.wiktionary.org' as if nothing is happening:

http://ht.wiktionary.org/w/api.php?action=query&meta=siteinfo&siprop=interwikimap

does not return an api call.

Aside from the family-system issues, why does pywikibot.exceptions.UnknownSite stop the bot? Can't it just ask whether to ignore or remove the link and go on, like JAnD said?

XZise set Security to None.
XZise removed a subscriber: Unknown Object (????).
JAnD triaged this task as High priority.Jan 9 2015, 1:36 PM

not only 'za', all non-existing codes (somebady copied interwiki from WP to Wikt => many unrecognized codes)

In T74674#965633, @JAnD wrote:

not only 'za', all non-existing codes (somebady copied interwiki from WP to Wikt => many unrecognized codes)

Okay za is an closed wiki but the API still works like any other wiktionary. So to fix that the Family class in pywikibot.families.wikitionary should support za in langs. Now I'm not sure if there are obsolete codes where this doesn't apply. If that is not the case one fix would be to add the obsolete codes to langs or to not only check langs in pywikibot.family.Family.hostname but also obsolete. Otherwise obsolete need to differ between just closed wikis and removed wikis.

But your problem with ht is different (as explained above). Also why “not just”: That file has 2k+ lines and unline pywikibot.site or pywikibot.page which are also quite large, is the script one section. So to understand it you have know how it works.

XZise renamed this task from Key error: u'za' in interwiki.py to Unable to communicate with obsolete wikis.Jan 9 2015, 2:35 PM
XZise updated the task description. (Show Details)

And another problem is: Should pywikibot.Site('za', 'wiktionary') actually be https://za.wiktionary.org/wiki/Yiebdaeuz or https://incubator.wikimedia.org/wiki/Wt/za which is then actually related to your second problem you mention here, because instead of closing “ht.wiktionary.org” it's redirecting (while “za.wiktionary.org” does not).

In the Wikimedia family definitions for pywikibot, these are the 'obsolete' entries which map to another code. i.e. dk.wikipedia.org -> da.wikipedia.org, etc

'dk': 'da',
'jp': 'ja',
'mo': 'ro',
'nl_nds': 'nl-nds', # miss-spelling
'nb': 'no'
'minnan': 'zh-min-nan',
'zh-cn': 'zh'
'zh-tw': 'zh',

For each, the purpose of each needs to be established. There may be hints in the original commit, but it will be easier if some of the old hands help explain why these sub-domains existed. Specifically we need to establish whether the old code :
a) is present in current wiki pages are interwiki links (high priority)
b) was ever a valid interwiki link, and therefore will exist in old revisions (lower priority)
c) was the mapping due to a MediaWiki language change, or only a Wikimedia subdomain choice. i.e. was "nl_nds" a language code in MediaWiki in the past, but has been updated in the software?

I thing this old codes (dk, jp etc) are used nowhere. But other codes (be-x-old, za, ik, ch...) are problematic - is not possible work on all categories on all wikis, because bot crashes. And using compat instead is problematic too, because of different bug. There are now only few families (wiktionary, wikibooks, wikiversity) on wmf wiikis, which are still using old-style interwiki. But at least wiktionary wil not have wikidata for long time yet. So we need functonal and stable interwiki bot.

Specifically we need to establish whether the old code :
a) is present in current wiki pages are interwiki links (high priority)

Only nb and perhaps dk I think?

Change 210275 had a related patch set uploaded (by John Vandenberg):
Access closed wikis

https://gerrit.wikimedia.org/r/210275

Change 214816 had a related patch set uploaded (by John Vandenberg):
Do not ask for password if user doesnt exist

https://gerrit.wikimedia.org/r/214816

JAnD renamed this task from Unable to communicate with obsolete wikis to Unable to communicate with obsolete / non-existing wikis .Jun 15 2015, 9:47 AM

Change 214816 merged by jenkins-bot:
Do not ask for password if user does not exist

https://gerrit.wikimedia.org/r/214816

I think it would be nice if bot(s) did not crash when they find a non-excisting wiki or if there is a link to a wiki that is not yet supported fully.

I/we filed a bug long ago about 'mai' because it breaks bot(s): https://phabricator.wikimedia.org/T76939

Either bots should be fixed to handle these issues or we should have a much faster way to add/remove wikis (a few days instead of several months).

I had this problem today with an langlink to gag.wiktionary (which happens to be in the Incubator).

Traceback (most recent call last):

File "D:\Work\pywikipedia\pwb.py", line 239, in <module>
  if not main():
File "D:\Work\pywikipedia\pwb.py", line 233, in main
  run_python_file(filename, [filename] + args, argvu, file_package)
File "D:\Work\pywikipedia\pwb.py", line 111, in run_python_file
  main_mod.__dict__)
File ".\scripts\interwiki.py", line 2641, in <module>
  main()
File ".\scripts\interwiki.py", line 2616, in main
  bot.run()
File ".\scripts\interwiki.py", line 2360, in run
  self.queryStep()
File ".\scripts\interwiki.py", line 2333, in queryStep
  self.oneQuery()
File ".\scripts\interwiki.py", line 2323, in oneQuery
  for page in gen:
File "D:\Work\pywikipedia\pywikibot\site.py", line 2886, in preloadpages
  api.update_page(page, pagedata, rvgen.props)
File "D:\Work\pywikipedia\pywikibot\data\api.py", line 3080, in update_page
  source=page.site)
File "D:\Work\pywikipedia\pywikibot\page.py", line 5081, in langlinkUnsafe
  link._site = pywikibot.Site(lang, source.family.name)
File "D:\Work\pywikipedia\pywikibot\__init__.py", line 615, in Site
  _sites[key] = interface(code=code, fam=fam, user=user, sysop=sysop)
File "D:\Work\pywikipedia\pywikibot\site.py", line 1638, in __init__
  BaseSite.__init__(self, code, fam, user, sysop)
File "D:\Work\pywikipedia\pywikibot\site.py", line 626, in __init__
  % (self.__code, self.__family.name))

pywikibot.exceptions.UnknownSite: Language 'gag' does not exist in family wiktionary
<class 'pywikibot.exceptions.UnknownSite'>
CRITICAL: Closing network session.

Currently, this is in my opinion the most annoying problem, cause it breaks the bot run, and most times, without the operator being able to know which page had the problematic langlink.

I'm not sure if an obsolete site is the same as a closed wiki, but in case of closed wikis, why not allow reads from it? Obviously, editing is not allowed, but read operations should be fine and would basically fix the problem.

We have patches in Gerrit to allow read access to closed wikis, and patches to automatically add new languages in a family. (And other family detection voodoo) Most are -1'd because they are not good enough yet, but we're getting close.

While we wait, submit patches to add languages as required.

Hello. I recently also face to same problem. The code are 'lad', 'vep' of wiktionary. Even they are located on Incubator; the links [lad:], [vep:] work, but the bot does not work. I hope someone can fix this.

In https://gerrit.wikimedia.org/r/#/c/398690/ we discussed pywikibot does not support read-only access and there is literally no difference between closed and removed wikis, both of them just calls RemovedSite instance with no read-only actions allowed (like basic getter Page.text or basic checker Page.exists()). Still a problem :/

PS: In that commit I wanted to access read-only Wikimania wikis (from 2005 to 2017) in some kind of read-only mode and the current one (2018) in read-write mode

Change 429875 had a related patch set uploaded (by Dvorapa; owner: Dvorapa):
[pywikibot/core@master] [WIP] Handle closed_wikis as read-only

https://gerrit.wikimedia.org/r/429875

Change 210275 abandoned by Dvorapa:
Access closed wikis and deprecate obsolete

Reason:
In favor of (WIP) https://gerrit.wikimedia.org/r/#/c/429875/

https://gerrit.wikimedia.org/r/210275

Change 517031 had a related patch set uploaded (by Dvorapa; owner: Dvorapa):
[pywikibot/core@master] Handle closed_wikis as read-only

https://gerrit.wikimedia.org/r/517031

Change 517031 merged by jenkins-bot:
[pywikibot/core@master] Handle closed_wikis as read-only

https://gerrit.wikimedia.org/r/517031

Okay, now closed (read-only) wikis can be read by any bot. What is still to be solved:

  • incubator interwiki redirects
  • skip non-existing codes in interwiki.py
  • support closed wikis also in interwiki.py
Dvorapa lowered the priority of this task from High to Medium.Jun 18 2019, 4:03 PM

Change 429875 abandoned by Dvorapa:
[WIP] Handle closed_wikis as read-only

Reason:
Patch has been already splitted into several smaller ones. The Beta Cluster part and Test codes part must be a little bit rethought though.

https://gerrit.wikimedia.org/r/429875