Page MenuHomePhabricator

Scripts fail if only family:wikidata is specified
Open, LowestPublic

Description

python pwb.py listpages.py -family:wikidata -start:A

Traceback (most recent call last):
  File "pwb.py", line 166, in <module>
    run_python_file(fn, argv, argvu)
  File "pwb.py", line 67, in run_python_file
    exec(compile(source, filename, "exec"), main_mod.__dict__)
  File "scripts/listpages.py", line 58, in <module>
    main()
  File "scripts/listpages.py", line 35, in main
    local_args = pywikibot.handleArgs(*args)
  File "/home/user/python/core/pywikibot/bot.py", line 638, in handleArgs
    init_handlers()
  File "/home/user/python/core/pywikibot/bot.py", line 246, in init_handlers
    writelogheader()
  File "/home/user/python/core/pywikibot/bot.py", line 257, in writelogheader
    site = pywikibot.Site()
  File "/home/user/python/core/pywikibot/__init__.py", line 527, in Site
    _sites[key] = __Site(code=code, fam=fam, user=user, sysop=sysop)
  File "/home/user/python/core/pywikibot/site.py", line 636, in __init__
    BaseSite.__init__(self, code, fam, user, sysop)
  File "/home/user/python/core/pywikibot/site.py", line 167, in __init__
    % (self.__code, self.__family.name))
pywikibot.exceptions.NoSuchSite: Language en does not exist in family wikidata
<class 'pywikibot.exceptions.NoSuchSite'>
CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort

In site.py, the following fails:

158     if (self.__family.name in list(self.__family.langs.keys()) and
159             len(self.__family.langs) == 1):

len(self.__family.langs) is not 1:

{'test': 'test.wikidata.org', 'wikidata': 'www.wikidata.org'}

Details

Reference
bz69255
Related Gerrit Patches:

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 3:28 AM
bzimport set Reference to bz69255.
Mpaa created this task.Aug 7 2014, 7:23 PM

Do you really think this needs to fix? when there is a test repo (two wikis instead of one) so It's unreasonable to fix it

Mpaa added a comment.Aug 13 2014, 6:41 PM

There is an inconsistent status, so I would fix this.
Up to you and the others' opinion how to move FW.

(In reply to Mpaa from comment #2)

There is an inconsistent status, so I would fix this.
Up to you and the others' opinion how to move FW.

I agree this should be fixed.

Something like this should be OK; very little change of causing problems with custom family files.

  • len(self.__family.langs) == 1):

+ len(self.__family.langs - ('test')) == 1):

jayvdb moved this task from Backlog to Framework on the Pywikibot-Wikidata board.Nov 30 2014, 12:09 PM

This has been proposed to be investigated as part of a Google-Code-in-2014 task.
https://www.google-melange.com/gci/task/view/google/gci2014/5826944515964928

Change 179586 had a related patch set uploaded (by M4tx):
Implement wbsearchentities

https://gerrit.wikimedia.org/r/179586

Patch-For-Review

Change 179599 had a related patch set uploaded (by M4tx):
Fix NoSuchSite error on multi-lang sites that have 'test' language family.

https://gerrit.wikimedia.org/r/179599

Patch-For-Review

Change 179586 had a related patch set uploaded (by M4tx):
Implement wbsearchentities
https://gerrit.wikimedia.org/r/179586
Patch-For-Review

Unrelated.

Ricordisamoa set Security to None.Dec 13 2014, 1:03 PM
Ricordisamoa removed a subscriber: gerritbot.
XZise added a subscriber: XZise.Dec 13 2014, 3:51 PM

If I understand the problem correctly it is, that when the family does contain only one language (e.g. 'commons' in comparison to 'wikipedia') the -lang parameter should be optional.

Now I'd then suggest that the family itself says if it contains a primary code which is added to 'langs' but 'langs' then contains mostly unused codes (usually only 'test'). The comparison could then be (if langs contains more than one element) if the primary code is not empty.

If I understand the problem correctly it is, that when the family does contain only one language (e.g. 'commons' in comparison to 'wikipedia') the -lang parameter should be optional.
Now I'd then suggest that the family itself says if it contains a primary code which is added to 'langs' but 'langs' then contains mostly unused codes (usually only 'test'). The comparison could then be (if langs contains more than one element) if the primary code is not empty.

Exactly what I proposed on https://gerrit.wikimedia.org/r/179599 :-)

Xqt added a subscriber: Xqt.Dec 16 2014, 7:58 AM

that when the family does contain only one language (e.g. 'commons' in comparison to 'wikipedia') the -lang parameter _is_ optional! This bug is invalid imho.

But there is basically only one wikidata (and one minor test wikidata) like there is only one commons.

I think we should rather scrap the magic 'choose the single language if only a single language is available' instead of making that magic even more magic by ignoring 'test' wikis. If we want to keep the behavior, we should make it explicit, and not depend on 'we have only a single language' (which is basically what M4tx implemented)

Ricordisamoa reassigned this task from Ladsgroup to m4tx.Dec 18 2014, 9:01 AM

Assigning to the patch uploader.

jayvdb updated the task description. (Show Details)Dec 18 2014, 10:34 AM
jayvdb added a project: Google-Code-in-2014.

I've thought a bit about what a sensible user interface would look like.

First an assumption: I think our code internally never uses the 'en:wikipedia' -> set family to 'commons' -> 'en:commons' -> 'commons:commons' magic. After all, otherwise wikidata wouldn't work at all. If this is not the case, I think we should make this the dase.

Then the only interface we have is the command line, where one can pass

  1. -family:XX -lang:YY, or
  2. -lang:YY, (implicit family), or
  3. -family:XX (implicit lang)

My suggested behavior would be:

  1. Always explicitly defines YY:XX. If that doesn't exist, we should raise an exception. So:
    • -family:wikidata -lang:wikidata gives wikidata:wikidata, but
    • -family:wikidata -lang:en raises an exception
  2. is (1) where the family is specified in the user-config.py 3. a) if the family file does not specify a default: use the mylang specified in the user-config file, i.e.
    • mylang=en, -family:wiktionary --> en.wiktionary
    • mylang=ru, -family:myrandomwiki (where myrandomwiki does not specify a default, and does not have a ru site) --> error b) if the family file specifies a default, always use that default, so
    • mylang=test, -family:wikidata ->wikidata:wikidata [we can still reach test.wikidata with -family:wikidata -lang:wikidata]

for specifying what the default is, I think m4tx's implementation makes sense.

XZise added a subscriber: m4tx.Dec 18 2014, 1:56 PM

That makes more sense than the current implementation and would improve @m4tx's implementation. The problem with “what is default?” still remains.

About the first suggested behavior: It's currently not possible to determine where the language is defined from; if it's from the config or command line. And 'mylang=test; -family=wikidata' using 'wikidata:wikidata' could be confusing so maybe this should be highlighted: If a family provides a default and the language is a valid language (but not the default).

Xqt added a comment.Dec 18 2014, 2:12 PM

Why not use a dict for default sites in user-config like
default_sites = {

'wikipedia': 'de'
'wikisource': 'en'
'wikidata': 'test'
'myownproject': 'klingon'

}
family = 'wikipedia'
mylang = None # maybe obsolete now

-family option will use the default language code except -lang option is given or there is only one langage in that project.
this means

  1. -family overrides config.family
  2. if -lang is given, take it and raise an error if site does not exist, otherwise
  3. if default_sites[family] is given, take that language code and raise an error if site does not exist, otherwise
  4. take mylang (as fallback, maybe deprecated) and raise an error if site does not exist.

This means bot operators may have the choice of the default sites for each project and if not defined an error would be the right hint. But there would be no surprise which site is used anymore.

IMO the BaseSite.init shouldnt be where lang/code is auto-guessed. This should be done in the pywikibot.Site factory function, with some help from the command line arg parsing routines if required.

An approach I have been mulling over is : the default site (URL) for any family is the one which has the same code as the family name. i.e. 'wikidata:wikidata' , 'commons:commons', etc. This is only *necessary* where the family has multiple codes, but it would be good to make that rule universal, which would mean changing the code of some sites, such as the wikitech families only site would be changed from 'en'->'wikitech', and the osm family needs the same change, and lyricwiki could be changed, however there are other languages of this family which are not in the family file, so I'd suggest not touching that one.

Then -family:wikidata (i.e. on the command line) would implicitly be -lang:wikidata also. To use test.wikidata via the command line, it needs to be explicitly mentioned: i.e. -family:wikidata -lang:test .

XZise added a comment.Dec 18 2014, 2:24 PM

I like @Xqt's idea of having different default codes for different sites and I agree with @jayvdb that the logic shouldn't be in the BaseSite.__init__.

Change 179599 abandoned by M4tx:
Fix error on multi-lang sites with invalid lang set.

https://gerrit.wikimedia.org/r/179599

Xqt removed m4tx as the assignee of this task.Nov 21 2017, 1:08 PM
Xqt claimed this task.

Is this issue occurs in commons also?

I am getting this error

pywikibot.exceptions.UnknownSite: Language 'en' does not exist in family commons

D3r1ck01 moved this task from Backlog to Needs Review on the Pywikibot board.Nov 5 2018, 11:38 AM