Page MenuHomePhabricator

High proportion of edit conflicts seem to come from new article creation
Open, MediumPublic

Description

I'm still investigating, and likely have misinterpreted the few clues. As part of Two-Column-Edit-Conflict-Merge, we instrumented the EditPageBeforeConflictDiff hook to record some information about the conflict, for both the legacy and newer TwoColConflict conflict workflows.

The event fields include,

  • $baseRevision = $editPage->getBaseRevision();
  • $latestRevision = $editPage->getArticle()->getRevision();

In other words, $baseRevision was the latest revision of the article at the time the edit page was opened, and $latestRevision is the latest revision at the time the edit page is submitted.

What we discovered is that roughly 1/3 of the edit conflicts have $baseRevision = $latestRevision = 0, which means that the article is new, no saved revision exists. This is happening at equal rates for both the legacy and new conflict interface, so it's caused by EditPage logic. Querying the Hive event database,

select
    sum(if(event.baseRevisionId == 0, 1, 0)) as new,
    count(*) as total 
from twocolconflictconflict
where
    year=2020 and month=1;

new     total
1399    5263

From a superficial reading of EditPage, my only guess is that doEditContent has failed with one of these error codes: edit-gone-missing, edit-conflict, or edit-already-exists.

Event Timeline

awight created this task.Feb 28 2020, 1:27 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 28 2020, 1:27 PM

@awight What's the severity of impact for this?

Change 576901 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/TwoColConflict@master] Log debugging information at start of conflicts

https://gerrit.wikimedia.org/r/576901

Change 576901 merged by jenkins-bot:
[mediawiki/extensions/TwoColConflict@master] Log debugging information at start of conflicts

https://gerrit.wikimedia.org/r/576901

@awight What's the severity of impact for this?

I'm not sure yet. I'm imagining it to be one of these possibilities:

  • These could be legitimate edit conflicts but not for new page creation, and the logging is broken, or
  • Editors may be experiencing a lot of unnecessary edit conflicts when creating new articles, for reasons other than a legitimate conflict with another editor trying to create the same page--I believe I saw code in the storage backend that can throw unrelated errors and these are translated into an edit conflict.

The merged patch above has some logging that should help diagnose. Meanwhile, spot-checking a couple of rows even with just the information we have in eventlogging suggests the latter. I think we're occasionally blocking people from saving new pages for an obscure technical reason.

It's also possible that the conflict happens because of a gadget or user script, and the user never sees it

WDoranWMF removed Pchelolo as the assignee of this task.Mar 24 2020, 8:44 PM
WDoranWMF triaged this task as Medium priority.
WDoranWMF raised the priority of this task from Medium to Needs Triage.
WDoranWMF triaged this task as Medium priority.
WDoranWMF added a subscriber: Pchelolo.

hello @awight, Is this still an issue and, if so, what did the logging reveal?

Change 620000 had a related patch set uploaded (by Awight; owner: Awight):
[analytics/wmde/TW/edit-conflicts@master] [WIP] Explore mystery conflicts

https://gerrit.wikimedia.org/r/620000

Change 620048 had a related patch set uploaded (by Awight; owner: Awight):
[analytics/wmde/TW/edit-conflicts@master] [WIP] Refresh "new article" conflicts for 2020

https://gerrit.wikimedia.org/r/620048

hello @awight, Is this still an issue and, if so, what did the logging reveal?

Yes, thanks for the prompt! I refreshed this notebook and unfortunately, this is still happening. I don't see any pattern to who is affected, the fact that it happens to anonymous users as well as a variety of logged-in users makes me think it's not just a misbehaving gadget, for example.

You can see these conflicts by looking in Schema:TwoColConflictConflict for rows where event.baseRevisionId == 0, or feel free to use the preprocessed spark parquet path in HDFS, /user/awight/edit-conflicts/new_article_conflicts. Happy to help with anything, it seems important to solve this mystery!

I have a few more questions that I might try to answer tomorrow, please add anything else that you come up with.

  • Are these users actually running into a conflict, and are unable to save the page?
  • If so, what proportion of users figure out that saving twice gets them past the glitch? (Unknown if this is a real workflow.)
  • Might want to instrument EditPage errors (and wishing I'd done so months ago).
  • Can anyone reproduce similar EventLogging records locally, by forcing various errors?
awight moved this task from Backlog to Watching - active on the User-awight board.

Change 620048 abandoned by Awight:
[analytics/wmde/TW/edit-conflicts@master] [WIP] Refresh conflict bug reports

Reason:
squashed

https://gerrit.wikimedia.org/r/620048

Change 620000 merged by Andrew-WMDE:
[analytics/wmde/TW/edit-conflicts@master] Sample more of 2020

https://gerrit.wikimedia.org/r/620000