Page MenuHomePhabricator

Conflict edit non recognized
Closed, DuplicatePublic

Description

I run reflinks.py witch use

pagegenerators.PreloadingGenerator(generator, step=50)

(every script that use this are potentially affected by this bug).

After the script retrieve the content of the page, and before the script save the page, a user made an edit at the page. The script save the page with the old content without any warning.
In this way, if a user made an edit in the meantime the script is running, this will be lost.

Example: https://it.wikipedia.org/w/index.php?title=Lituania&diff=prev&oldid=75880159 where my bot undo the anonymous user edit

Event Timeline

Beta16 created this task.Oct 23 2015, 10:18 AM
Beta16 raised the priority of this task from to High.
Beta16 updated the task description. (Show Details)
Beta16 added a subscriber: Beta16.
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptOct 23 2015, 10:18 AM
XZise added a subscriber: XZise.Oct 23 2015, 10:24 AM

The wiki should actually detect there is an editing conflict and report that accordingly. It should also print a warning (at least in that script) that an editing conflict occurred.

Maybe changes related to latest_revision have broken edit conflict detection for preloaded pages?

Unfortunately we dont have tests which cover this. T116372

A quick workaround for that would be for the test case to pause and ask the test user to make an edit manually. That test case would be disabled for CI of course.

Maybe (this is only a hypothesis) the wiki detect an editing conflict only in the second step, when the script save the page.

But if

  1. the script: Retrieve 50 pages from wikipedia
  2. the script runs and processes the pages, one by one and then saves it
  3. an user made an edit in a page that is not been processed yet by the script
  4. when the script processes the page, it made its own edit and saves the page with the old revision (the content is retrieved at point 1.) without user's edit

I hope it's more clearly now

The wiki supports to define the timestamp of the latest revision in the edit so it can detect it. And we in theory apply that (see APISite.editpage) but only if the actual base revision is cached. Otherwise it might load the latest revision directly before the save which of course defeats the purpose as there won't be a conflict with the latest revision if it's loaded almost immediately before saving it.

So it requests the latests revision in editpage:

[…]
if basetimestamp or not recreate:
    try:
        lastrev = page.latest_revision
        basetimestamp = lastrev.timestamp
    except NoPage:
        basetimestamp = False
        if not recreate:
            raise
[…]

And the method in the Page class:

@property
def latest_revision(self):
    """Return the current revision for this page."""
    rev = self._latest_cached_revision()
    if rev is not None:
        return rev
    return next(self.revisions(content=True, total=1))

And that is using _latest_cached_revision which is valid when using the PreloadingGenerator:

>>> from pywikibot.pagegenerators import PreloadingGenerator as P
>>> import pywikibot as py
>>> s = py.Site()
>>> p = py.Page(s, 'Main Page')
>>> pp = list(P([p]))[0]
Retrieving 1 pages from wikipedia:test.
>>> pp._latest_cached_revision() is None
False
Mpaa added a subscriber: Mpaa.Oct 23 2015, 8:19 PM

This was solved in https://phabricator.wikimedia.org/T93364.
We should see what has changed since then.

Nnemo added a subscriber: Nnemo.Oct 24 2015, 4:06 PM
Mpaa added a comment.Oct 27 2015, 8:04 PM

I could not reproduce it.

As MpaaBot:

page = pywikibot.Page(site, 'User:Mpaa/y')
gen = PrefixingPageGenerator(prefix='User:Mpaa/', site=site)
gen = PreloadingGenerator(gen)
for p in gen:
    if p == page:
         pp = p

Retrieving 20 pages from wikisource:en.

In [25]: pp._latest_cached_revision()
Out[25]: {'comment': u'new test', '_sha1': None, 'text': u'yet another', 'revid': 5405347, 'anon': False, 'user': u'Mpaa', 'rollbacktoken': None, 'timestamp': Timestamp(2015, 5, 3, 20, 34, 51), '_content_model': u'wikitext', '_parent_id': 5405334, 'minor': False}

Then I made an edit as Mpaa via browser.
And continued in the shell.

In [26]: pp.text = 'test'
In [27]: pp.save()
WARNING: API error editconflict: Edit conflict detected

EditConflict: Page [[en:User:Mpaa/y]] could not be saved due to an edit conflict

In [28]: pp._latest_cached_revision()
Out[28]: {'comment': u'new test', '_sha1': None, 'text': u'yet another', 'revid': 5405347, 'anon': False, 'user': u'Mpaa', 'rollbacktoken': None, 'timestamp': Timestamp(2015, 5, 3, 20, 34, 51), '_content_model': u'wikitext', '_parent_id': 5405334, 'minor': False}

At this stage I do not know how to explain how the edit in the task's description could have happened