Page MenuHomePhabricator

TestSiteGenerators.testLinkMethods is failing on Travis CI
Closed, ResolvedPublic

Description

______________________ TestSiteGenerators.testLinkMethods ______________________
self = <tests.site_tests.TestSiteGenerators testMethod=testLinkMethods>
    def testLinkMethods(self):
        """Test site methods for getting links to and from a page."""
        if self.site.family.name == 'wpbeta':
            raise unittest.SkipTest('Test fails on betawiki; T69931')
        mysite = self.get_site()
        mainpage = self.get_mainpage()
        backlinks = set(mysite.pagebacklinks(mainpage, namespaces=[0]))
        # only non-redirects:
        filtered = set(mysite.pagebacklinks(mainpage, namespaces=0,
                                            filterRedirects=False))
        # only redirects:
        redirs = set(mysite.pagebacklinks(mainpage, namespaces=0,
                                          filterRedirects=True))
        # including links to redirect pages (but not the redirects):
        indirect = set(mysite.pagebacklinks(mainpage, namespaces=[0],
                                            followRedirects=True,
                                            filterRedirects=False))
        self.assertEqual(filtered & redirs, set([]))
        self.assertEqual(indirect & redirs, set([]))
        self.assertLessEqual(filtered, indirect)
        self.assertLessEqual(filtered, backlinks)
        self.assertLessEqual(redirs, backlinks)
        self.assertLessEqual(
            backlinks,
            set(self.site.pagebacklinks(mainpage, namespaces=[0, 2])))
    
        # pagereferences includes both backlinks and embeddedin
        embedded = set(mysite.page_embeddedin(mainpage, namespaces=[0]))
        refs = set(mysite.pagereferences(mainpage, namespaces=[0]))
        self.assertTrue(backlinks.issubset(refs))
        self.assertTrue(embedded.issubset(refs))
        for bl in backlinks:
            self.assertIsInstance(bl, pywikibot.Page)
            self.assertIn(bl, refs)
        for ei in embedded:
            self.assertIsInstance(ei, pywikibot.Page)
            self.assertIn(ei, refs)
        for ref in refs:
            self.assertIn(ref, backlinks | embedded)
        # test embeddedin arguments
        self.assertTrue(embedded.issuperset(
            set(mysite.page_embeddedin(mainpage, filterRedirects=True,
                                       namespaces=[0]))))
        self.assertTrue(embedded.issuperset(
            set(mysite.page_embeddedin(mainpage, filterRedirects=False,
                                       namespaces=[0]))))
        self.assertTrue(embedded.issubset(
            set(mysite.page_embeddedin(mainpage, namespaces=[0, 2]))))
        links = set(mysite.pagelinks(mainpage))
        for pl in links:
            self.assertIsInstance(pl, pywikibot.Page)
        # test links arguments
        # TODO: There have been build failures because the following assertion
        # wasn't true. Bug: T92856
        # Example: https://travis-ci.org/wikimedia/pywikibot-core/jobs/54552081#L505
        namespace_links = set(mysite.pagelinks(mainpage, namespaces=[0, 1]))
        if namespace_links - links:
            unittest_print(
                'FAILURE wrt T92856:\nSym. difference: "{0}"'.format(
                    '", "'.join(
                        '{0}@{1}'.format(link.namespace(),
                                         link.title(withNamespace=False))
                        for link in namespace_links ^ links)))
        self.assertCountEqual(
            set(mysite.pagelinks(mainpage, namespaces=[0, 1])) - links, [])
        for target in mysite.preloadpages(mysite.pagelinks(mainpage,
                                                           follow_redirects=True,
                                                           total=5)):
            self.assertIsInstance(target, pywikibot.Page)
            self.assertFalse(target.isRedirectPage())
        # test pagecategories
        for cat in mysite.pagecategories(mainpage):
            self.assertIsInstance(cat, pywikibot.Category)
            for cm in mysite.categorymembers(cat):
                self.assertIsInstance(cat, pywikibot.Page)
        # test pageimages
        self.assertTrue(all(isinstance(im, pywikibot.FilePage)
                            for im in mysite.pageimages(mainpage)))
        # test pagetemplates
        self.assertTrue(all(isinstance(te, pywikibot.Page)
                            for te in mysite.pagetemplates(mainpage)))
        self.assertTrue(set(mysite.pagetemplates(mainpage)).issuperset(
>                       set(mysite.pagetemplates(mainpage, namespaces=[10]))))
E       AssertionError: False is not true
tests/site_tests.py:542: AssertionError

see e.g.: https://travis-ci.org/wikimedia/pywikibot-core/jobs/184654277#L3800

Event Timeline

This comment was removed by Magul.

Change 356835 had a related patch set uploaded (by Dalba; owner: Dalba):
[pywikibot/core@master] site_tests.testLinkMethods: Ignore errors caused by page content change

https://gerrit.wikimedia.org/r/356835

Dalba removed Dalba as the assignee of this task.Jun 5 2017, 11:09 AM
Dalba subscribed.
Dalba unsubscribed.

Change 357195 had a related patch set uploaded (by Merlijn van Deen; owner: Merlijn van Deen):
[pywikibot/core@master] Disable request caching on Travis

https://gerrit.wikimedia.org/r/357195

@Dalba

Is this patch supposed to resolve T92856/T153501 or is it a WIP? Because I do not see how disabling the cache will resolve the issue. Here is what is happening in T153501:

  1. The test suite fetches the list of main page templates (in any namespace) through the API.
  2. In the meanwhile a user makes an edit on the main page that adds Template:X to the main page.
  3. Test suite fetches the list of templates in the template namespace only.
  4. There will be an AssertionError because Template:X is found in the list templates in step 3, but not in in the list of step 1.

All of these steps may happen – no matter if the cache is disabled or not.

I'm well aware that there is still a potential for race conditions when the cache has been disabled. However, it becomes severely limited because the race condition now has to happen between the two requests on the same line:

self.assertTrue(set(mysite.pagetemplates(mainpage)).issuperset(
>                       set(mysite.pagetemplates(mainpage, namespaces=[10]))))

rather than (potentially) two different test functions that are executed at completely different times. That doesn't seem to be the case here (although test_API_limits_with_site_methods and testLinkMethods both call mysite.pagetemplates, they do so with slightly different parameters), but this does seem to be the case for T92856 (pagelinks is used all over the place).

it becomes severely limited because the race condition now has to happen between the two requests on the same line

I don't think so. The problem is not in the number of lines in between, but the time it takes to fetch the API calls. It'll take only a few milliseconds at most to get the results from cache. But the first uncached calls are the real costly ones. They should be orders of magnitude slower. That race condition is most likely to happen during API calls, eliminating the python code in between won't have much effect.

Also, if we are just planning to minimize the the delay between the fetches by eliminating the python code between them, we can simply store the result of the API calls in some variables, and reuse them later... but as I said, I don't think it'll help.

No, of course it's not about the lines in between, but the time in between. And time does depend on the code executed, mostly because that code typically also performs more API requests.

In this specific case it won't help much (because the different pagetemplates calls are quite close already), but this is also a rare case. The situation with pagelinks is not that there are two requests close together in one test function, but rather there are many calls all around the test suite. The time between the first request (which is then cached) and the test becomes quite large and it's therefore more likely to trigger the race condition.

In any case, not caching everything in Travis makes reasoning a lot easier. For example, https://gerrit.wikimedia.org/r/#/c/356835/3/tests/site_tests.py reduces to something along the lines of

all_templates = mysite.pagetemplates(mainpage)
ns10_templates = mysite.pagetemplates(mainpage, namespaces=[10])
if (all_templates == mysite.pagetemplates(mainpage)):
    self.assertIsSubsetOf(ns10_templates, all_templates)

Change 376898 had a related patch set uploaded (by Dalba; owner: Dalba):
[pywikibot/core@master] site_tests.py: Used a freezed API response to avoid race conditions

https://gerrit.wikimedia.org/r/376898

Change 377047 had a related patch set uploaded (by Dalba; owner: Dalba):
[pywikibot/core@master] site_tests.py: Avoid race conditions in test_pagetemplates and test_pagelinks

https://gerrit.wikimedia.org/r/377047

Change 376898 abandoned by Dalba:
site_tests.py: Use a freezed API response to avoid race conditions

Reason:
In favour of https://gerrit.wikimedia.org/r/#/c/377047

https://gerrit.wikimedia.org/r/376898

Change 377047 merged by jenkins-bot:
[pywikibot/core@master] site_tests.py: Avoid race conditions in test_pagetemplates and test_pagelinks

https://gerrit.wikimedia.org/r/377047

Change 356835 abandoned by Dalba:
site_tests.py: Handle random errors caused by page content change

Reason:
Change 377047 was merged as an alternative.

https://gerrit.wikimedia.org/r/356835

Dalba claimed this task.

Change 357195 abandoned by Dalba:
Disable request caching on Travis

Reason:
I guess this change can be abandoned now that 377047 has been merged.

https://gerrit.wikimedia.org/r/357195