Page MenuHomePhabricator

test_page_from_repository fails with EntityTypeUnknownException on multiple sites
Closed, ResolvedPublic

Description

https://travis-ci.org/wikimedia/pywikibot/jobs/633377685#L1496
https://api.travis-ci.org/v3/job/679015138/log.txt

======================================================================

ERROR: test_item (tests.site_tests.TestDataSitePreloading)

Test that ItemPage preloading works for Item objects.

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/home/travis/build/wikimedia/pywikibot/tests/site_tests.py", line 3358, in test_item

    for num in range(1, 6)]

  File "/home/travis/build/wikimedia/pywikibot/tests/site_tests.py", line 3358, in <listcomp>

    for num in range(1, 6)]

  File "/home/travis/build/wikimedia/pywikibot/pywikibot/page.py", line 4416, in __init__

    ns = site.item_namespace

  File "/home/travis/build/wikimedia/pywikibot/pywikibot/site.py", line 7668, in item_namespace

    self._item_namespace = self.get_namespace_for_entity_type('item')

  File "/home/travis/build/wikimedia/pywikibot/pywikibot/site.py", line 7657, in get_namespace_for_entity_type

    .format(self, entity_type))

pywikibot.exceptions.EntityTypeUnknownException: DataSite("wikidata", "wikidata") does not support entity type "item"
___________ TestCategoryFromWikibase.test_page_from_repository_de_wp ___________

self = <tests.site_tests.TestCategoryFromWikibase testMethod=test_page_from_repository_de_wp>

    def wrapped_method(self):

        sitedata = self.sites[key]

        self.site_key = key

        self.family = sitedata['family']

        self.code = sitedata['code']

        self.site = sitedata['site']

>       func(self, key)

tests/aspects.py:748: 

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

tests/site_tests.py:3787: in test_page_from_repository

    page = site.page_from_repository(self.ITEM)

pywikibot/site.py:2814: in page_from_repository

    dp = pywikibot.ItemPage(repo, item)

pywikibot/page.py:4422: in __init__

    ns = site.item_namespace

pywikibot/site.py:7668: in item_namespace

    self._item_namespace = self.get_namespace_for_entity_type('item')

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = DataSite("wikidata", "wikidata"), entity_type = 'item'

    def get_namespace_for_entity_type(self, entity_type):

        """

        Return namespace for given entity type.

    

        @return: corresponding namespace

        @rtype: Namespace

        """

        if not hasattr(self, '_entity_namespaces'):

            self._cache_entity_namespaces()

        if entity_type in self._entity_namespaces:

            return self._entity_namespaces[entity_type]

        raise EntityTypeUnknownException(

            '{0!r} does not support entity type "{1}"'

>           .format(self, entity_type))

E       EntityTypeUnknownException: DataSite("wikidata", "wikidata") does not support entity type "item"

pywikibot/site.py:7657: EntityTypeUnknownException

Event Timeline

@Xqt @matej_suchanek The error occurs only sometimes and it makes test produce milions of errors at once. I think this has something to do with T242081 as the 'item' is loaded into cache with a WD namespaces request. WD requests fail lately with timeout and thus can result in a broken cache. Which makes errors like this, because there is no cache-check or cache-repair mechanism or namespace-check or namespace-repair mechanism as well

But the code haven't changed much in the mentioned patch, which is weird

Okay, if DataSite.namespaces fails to get namespaces (due to timeout), this error is thrown, which is not a good behavior

So this is a duplicate of T242081, but the error message is not much useful

Basically, here (_build_namespaces): https://phabricator.wikimedia.org/diffusion/PWBC/browse/master/pywikibot/site.py$2618 should be a check that all of these (builtin_namespaces): https://phabricator.wikimedia.org/diffusion/PWBC/browse/master/pywikibot/site.py$406 are included, otherwise it should fail with API request error and not return an empty list of namespaces to continue!

Okay, if DataSite.namespaces fails to get namespaces (due to timeout), this error is thrown, which is not a good behavior

This means a general exception is passed silently I guess.

It seems so as it then works with empty/incomplete namespace list and produces the error message in the description

@matej_suchanek Do you know any solution for this issue?

Summary: Wikidata maxlag produces empty Wikidata namespace list. Therefore self._entity_namespaces (which depends on it) is empty/incomplete and makes tests fail if that method to create self._entity_namespaces fails due to maxlag

@matej_suchanek Do you know any solution for this issue?

Failing hard in tests when maxlag is too high (no silent error and empty responses). Or not using (relaxing) maxlag on read requests.

@Xqt Could we somehow check whether any of self._entity_namespaces or WD namespaces list is (in)complete and try to recreate that if it isn't?

Perhaps we could check if WD namespaces list (cached) is empty or does not contain basic 0 to 15 namespaces?

Summary: Wikidata maxlag produces empty Wikidata namespace list

I am wondering. I get a TimeoutError in that case. And SkipTest should skip the test then too.
Where is the trick that the exception is not raisen? Probably there is no maxlag timeout from wikibase but just gives an empty or unsufficent NamespacesDict with _build_namespaces()

>>> import pwb, pywikibot as py
>>> s = py.Site()
>>> r = s.data_repository()
>>> ns = r.namespaces
Sleeping for 5.0 seconds, 2020-03-09 17:53:50
Sleeping for 5.5 seconds, 2020-03-09 17:53:55
Sleeping for 8.5 seconds, 2020-03-09 17:54:01
Sleeping for 11.3 seconds, 2020-03-09 17:54:10
Sleeping for 14.2 seconds, 2020-03-09 17:54:22
Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    ns = r.namespaces
  File "C:\pwb\GIT\core\pywikibot\site.py", line 1013, in namespaces
    self._namespaces = NamespacesDict(self._build_namespaces())
  File "C:\pwb\GIT\core\pywikibot\site.py", line 2647, in _build_namespaces
    for nsdata in self.siteinfo.get('namespaces', cache=False).values():
  File "C:\pwb\GIT\core\pywikibot\site.py", line 1683, in get
    preloaded = self._get_general(key, expiry)
  File "C:\pwb\GIT\core\pywikibot\site.py", line 1629, in _get_general
    default_info = self._get_siteinfo(props, expiry)
  File "C:\pwb\GIT\core\pywikibot\site.py", line 1552, in _get_siteinfo
    data = request.submit()
  File "C:\pwb\GIT\core\pywikibot\data\api.py", line 2258, in submit
    self._data = super(CachedRequest, self).submit()
  File "C:\pwb\GIT\core\pywikibot\data\api.py", line 2105, in submit
    raise MaxlagTimeoutError(msg)
pywikibot.exceptions.MaxlagTimeoutError: Maximum retries attempted due to maxlag without success.

if maxlag Timeout occurs the tests are skipped as expected (see wp.de and wp.en) but it seems there is not maxlag for ws.it. The NamespacesDict is insufficient instead:

Validate page_from_repository on wikipedia:de ... Sleeping for 5.0 seconds, 2020-03-09 16:11:17
Sleeping for 5.0 seconds, 2020-03-09 16:11:22
Sleeping for 5.0 seconds, 2020-03-09 16:11:28
Sleeping for 5.0 seconds, 2020-03-09 16:11:33
Sleeping for 6.2 seconds, 2020-03-09 16:11:38
skipped 'Maximum retries attempted due to maxlag without success.'
 26.987s test_page_from_repository_en_wp (tests.site_tests.TestCategoryFromWikibase)
Validate page_from_repository on wikipedia:en ... Sleeping for 5.0 seconds, 2020-03-09 16:11:44
Sleeping for 5.0 seconds, 2020-03-09 16:11:49
Sleeping for 5.0 seconds, 2020-03-09 16:11:54
Sleeping for 5.0 seconds, 2020-03-09 16:12:00
Sleeping for 7.1 seconds, 2020-03-09 16:12:05
skipped 'Maximum retries attempted due to maxlag without success.'
 27.878s test_page_from_repository_it_ws (tests.site_tests.TestCategoryFromWikibase)
Validate page_from_repository on wikisource:it ... ERROR

This is a new behaviour of api fault ihmo.

@Ladsgroup: Any idea who can help upstream solving this malfunction?

if maxlag Timeout occurs the tests are skipped as expected (see wp.de and wp.en) but it seems there is not maxlag for ws.it. The NamespacesDict is insufficient instead:

Validate page_from_repository on wikipedia:de ... Sleeping for 5.0 seconds, 2020-03-09 16:11:17
Sleeping for 5.0 seconds, 2020-03-09 16:11:22
Sleeping for 5.0 seconds, 2020-03-09 16:11:28
Sleeping for 5.0 seconds, 2020-03-09 16:11:33
Sleeping for 6.2 seconds, 2020-03-09 16:11:38
skipped 'Maximum retries attempted due to maxlag without success.'
 26.987s test_page_from_repository_en_wp (tests.site_tests.TestCategoryFromWikibase)
Validate page_from_repository on wikipedia:en ... Sleeping for 5.0 seconds, 2020-03-09 16:11:44
Sleeping for 5.0 seconds, 2020-03-09 16:11:49
Sleeping for 5.0 seconds, 2020-03-09 16:11:54
Sleeping for 5.0 seconds, 2020-03-09 16:12:00
Sleeping for 7.1 seconds, 2020-03-09 16:12:05
skipped 'Maximum retries attempted due to maxlag without success.'
 27.878s test_page_from_repository_it_ws (tests.site_tests.TestCategoryFromWikibase)
Validate page_from_repository on wikisource:it ... ERROR

This is a new behaviour of api fault ihmo.

Still waiting for a response

Asking ws:it API for namespaces gives broken list when maxlag condition occurs. No maxlag/timeout error, no empty list like in other cases.

eprodromou subscribed.

What's the question that you're asking? Is there something we can do to help make this test pass?

Xqt claimed this task.

Solved upstream with T242081 I guess