Page MenuHomePhabricator

Some proofread page tests are writing to the wiki without being declared as write tests
Closed, ResolvedPublic

Description

_get_page_mappings is doing a purge, which is a write operation. That means these tests are writing to the wiki without the class being marked write = True.

Can the purge be avoided, so that these test methods always run?

An alternative approach is to create a new special value for the write flag which indicates that the test uses the purge operation, which is a special kind of write operation because it doesnt create a edit/log entry.

======================================================================
ERROR: test_check_if_cached_enws (tests.proofreadpage_tests.TestIndexPageMappings)
Test if cache is checked and loaded properly on wikisource:en
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/aspects.py", line 688, in wrapped_method
    func(self, key)
  File "tests/proofreadpage_tests.py", line 456, in test_check_if_cached
    fetched_label = index_page.get_label_from_page_number(num)
  File "pywikibot/proofreadpage.py", line 471, in wrapper
    self._get_page_mappings()
  File "pywikibot/proofreadpage.py", line 514, in _get_page_mappings
    self.purge()
  File "pywikibot/page.py", line 1299, in purge
    return self.site.purgepages([self], **kwargs)
  File "pywikibot/site.py", line 1304, in callee
    return fn(self, *args, **kwargs)
  File "pywikibot/site.py", line 5465, in purgepages
    titles=[page for page in set(pages)])
  File "pywikibot/site.py", line 1959, in _simple_request
    site=self, **kwargs)
  File "tests/__init__.py", line 285, in create_simple
    return cls(site=site, parameters=kwargs)
  File "tests/__init__.py", line 279, in __init__
    super(TestRequest, self).__init__(0, *args, **kwargs)
  File "pywikibot/data/api.py", line 2205, in __init__
    super(CachedRequest, self).__init__(*args, **kwargs)
  File "pywikibot/data/api.py", line 1452, in __init__
    % self.site._userinfo['name'])
Error: API write action attempted as IP u'xxx.xxx.xxx.xxx'

======================================================================
ERROR: test_get_labels_enws (tests.proofreadpage_tests.TestIndexPageMappings)
Test IndexPage page get_label_from_* functions on wikisource:en
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/aspects.py", line 688, in wrapped_method
    func(self, key)
  File "tests/proofreadpage_tests.py", line 483, in test_get_labels
    self.assertEqual(index_page.get_label_from_page_number(num), label)
  File "pywikibot/proofreadpage.py", line 471, in wrapper
    self._get_page_mappings()
  File "pywikibot/proofreadpage.py", line 514, in _get_page_mappings
    self.purge()
  File "pywikibot/page.py", line 1299, in purge
    return self.site.purgepages([self], **kwargs)
  File "pywikibot/site.py", line 1304, in callee
    return fn(self, *args, **kwargs)
  File "pywikibot/site.py", line 5465, in purgepages
    titles=[page for page in set(pages)])
  File "pywikibot/site.py", line 1959, in _simple_request
    site=self, **kwargs)
  File "tests/__init__.py", line 285, in create_simple
    return cls(site=site, parameters=kwargs)
  File "tests/__init__.py", line 279, in __init__
    super(TestRequest, self).__init__(0, *args, **kwargs)
  File "pywikibot/data/api.py", line 2205, in __init__
    super(CachedRequest, self).__init__(*args, **kwargs)
  File "pywikibot/data/api.py", line 1452, in __init__
    % self.site._userinfo['name'])
Error: API write action attempted as IP u'xxx.xxx.xxx.xxx'

======================================================================
ERROR: test_get_page_and_number_enws (tests.proofreadpage_tests.TestIndexPageMappings)
Test IndexPage page get_page_number functions on wikisource:en
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/aspects.py", line 688, in wrapped_method
    func(self, key)
  File "tests/proofreadpage_tests.py", line 500, in test_get_page_and_number
    self.assertEqual(index_page.get_page_number_from_label(label),
  File "pywikibot/proofreadpage.py", line 471, in wrapper
    self._get_page_mappings()
  File "pywikibot/proofreadpage.py", line 514, in _get_page_mappings
    self.purge()
  File "pywikibot/page.py", line 1299, in purge
    return self.site.purgepages([self], **kwargs)
  File "pywikibot/site.py", line 1304, in callee
    return fn(self, *args, **kwargs)
  File "pywikibot/site.py", line 5465, in purgepages
    titles=[page for page in set(pages)])
  File "pywikibot/site.py", line 1959, in _simple_request
    site=self, **kwargs)
  File "tests/__init__.py", line 285, in create_simple
    return cls(site=site, parameters=kwargs)
  File "tests/__init__.py", line 279, in __init__
    super(TestRequest, self).__init__(0, *args, **kwargs)
  File "pywikibot/data/api.py", line 2205, in __init__
    super(CachedRequest, self).__init__(*args, **kwargs)
  File "pywikibot/data/api.py", line 1452, in __init__
    % self.site._userinfo['name'])
Error: API write action attempted as IP u'xxx.xxx.xxx.xxx'

======================================================================
ERROR: test_num_pages_enws (tests.proofreadpage_tests.TestIndexPageMappings)
Test num_pages property on wikisource:en
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/aspects.py", line 688, in wrapped_method
    func(self, key)
  File "tests/proofreadpage_tests.py", line 471, in test_num_pages
    self.assertEqual(index_page.num_pages, self.sites[key]['num_pages'])
  File "pywikibot/proofreadpage.py", line 471, in wrapper
    self._get_page_mappings()
  File "pywikibot/proofreadpage.py", line 514, in _get_page_mappings
    self.purge()
  File "pywikibot/page.py", line 1299, in purge
    return self.site.purgepages([self], **kwargs)
  File "pywikibot/site.py", line 1304, in callee
    return fn(self, *args, **kwargs)
  File "pywikibot/site.py", line 5465, in purgepages
    titles=[page for page in set(pages)])
  File "pywikibot/site.py", line 1959, in _simple_request
    site=self, **kwargs)
  File "tests/__init__.py", line 285, in create_simple
    return cls(site=site, parameters=kwargs)
  File "tests/__init__.py", line 279, in __init__
    super(TestRequest, self).__init__(0, *args, **kwargs)
  File "pywikibot/data/api.py", line 2205, in __init__
    super(CachedRequest, self).__init__(*args, **kwargs)
  File "pywikibot/data/api.py", line 1452, in __init__
    % self.site._userinfo['name'])
Error: API write action attempted as IP u'xxx.xxx.xxx.xxx'

======================================================================
ERROR: test_page_gen_enws (tests.proofreadpage_tests.TestIndexPageMappings)
Test Index page generator on wikisource:en
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/aspects.py", line 688, in wrapped_method
    func(self, key)
  File "tests/proofreadpage_tests.py", line 541, in test_page_gen
    self.assertRaises(ValueError, index_page.page_gen, -1, 2)
  File "/usr/lib64/python2.7/unittest/case.py", line 513, in assertRaises
    callableObj(*args, **kwargs)
  File "pywikibot/proofreadpage.py", line 613, in page_gen
    % (start, end, 1, self.num_pages))
  File "pywikibot/proofreadpage.py", line 471, in wrapper
    self._get_page_mappings()
  File "pywikibot/proofreadpage.py", line 514, in _get_page_mappings
    self.purge()
  File "pywikibot/page.py", line 1299, in purge
    return self.site.purgepages([self], **kwargs)
  File "pywikibot/site.py", line 1304, in callee
    return fn(self, *args, **kwargs)
  File "pywikibot/site.py", line 5465, in purgepages
    titles=[page for page in set(pages)])
  File "pywikibot/site.py", line 1959, in _simple_request
    site=self, **kwargs)
  File "tests/__init__.py", line 285, in create_simple
    return cls(site=site, parameters=kwargs)
  File "tests/__init__.py", line 279, in __init__
    super(TestRequest, self).__init__(0, *args, **kwargs)
  File "pywikibot/data/api.py", line 2205, in __init__
    super(CachedRequest, self).__init__(*args, **kwargs)
  File "pywikibot/data/api.py", line 1452, in __init__
    % self.site._userinfo['name'])
Error: API write action attempted as IP u'xxx.xxx.xxx.xxx'

Event Timeline

IIRC, the reason for purging is this: T114318

Can we detect when the expected class attributes/etc are missing, and only purge when they are missing? (or is the code already doing that?)

I think it is already like that -> if not self._soup.find_all('a', attrs=attrs):

Is it harmful just set write = True for those test classes?

I think it is already like that -> if not self._soup.find_all('a', attrs=attrs):

Yes, I see that. Hmm. Then the test suite is regularly purging the page, and these attributes, etc are disappearing every time?? This test class currently uses cached = True, which I guess should be removed as it is doing a purge operation, so I expect that the caching will be returning the pre-purge cached page text. Strange.

Is it harmful just set write = True for those test classes?

write = True means the tests dont run by default. Also, write = True currently checks that the site is a test wiki, not a real wiki, so these tests would need to only use test2.wikipedia.org .

That is why I suggested we might add a special value for the write flag, which only permits the purge operation, which could be done on any wiki and not only test wikis.

I think it is already like that -> if not self._soup.find_all('a', attrs=attrs):

Yes, I see that. Hmm. Then the test suite is regularly purging the page, and these attributes, etc are disappearing every time?? This test class currently uses cached = True, which I guess should be removed as it is doing a purge operation, so I expect that the caching will be returning the pre-purge cached page text. Strange.

The purge should happen only when the Index Page (on ws) for some reason has lost the info. It is not a "normal" situation.
Maybe at some time, the test suite might cache page text that requires purging and sometimes might not (I suppose the cache is cleared every now and then ....). This might also explain why now this is showing up?

The travis builds work with a clean cache for each job, so I doubt the caching is the only problem here.

It was failing on my PC. I cleared ~/python/core/tests/apicache-py2 and now it is OK.
Is this cache also cleared?

It was failing on my PC. I cleared ~/python/core/tests/apicache-py2 and now it is OK.
Is this cache also cleared?

Each travis job start with a cold cache. the proofread page tests are not first, so the cache is warm at these tests, but the cache only contains data fetched ~20 mins earlier.

Is this always failing or randomly?

This is happening because of the en.wikisource bug.
More info can be found here - T128986

Is it an option to remove:

@must_be(group='user') from site.def purgepages(self, pages, **kwargs)

and not consider "purge" as an action which sets self.write = True in api.Request()?

Can it do any harm?

I could do:

curl --request POST 'https://en.wikisource.org/w/api.php?action=purge&titles=Index:Popular+Science+Monthly+Volume+47.djvu'

getting:
{

"batchcomplete": "",
"purge": [
    {
        "ns": 106,
        "title": "Index:Popular Science Monthly Volume 47.djvu",
        "purged": ""
    }
]

}

This would solve both issues.

Purge is a write right. https://en.wikisource.org/w/api.php?action=paraminfo&modules=purge so we shouldnt remove the decorator without a lot of analysis/testing. We previously didnt have the decorator, and there were problems, probably with the assert extension.

Curl/wget is an interesting approach. http.fetch bypassing api.py will do the trick, at least as a temporary workaround.

Change 284754 had a related patch set uploaded (by Mpaa):
proofreadpage.py: workaround to allow purging without write rights

https://gerrit.wikimedia.org/r/284754

Change 284754 merged by jenkins-bot:
proofreadpage.py: workaround to allow purging without write rights

https://gerrit.wikimedia.org/r/284754

Only a workaround has been implemented.
It is good enough IMO to close the bug, even if the philosophical question of what is 'purge' still remains ....

Xqt claimed this task.
Xqt triaged this task as Lowest priority.
Xqt subscribed.

The remaining issue can be reopened but I guess there are other tasks with higher prio.

I spotted that this task is still mentioned in proofreadpage.purge(). Is that method still needed since this is resolved?