Page MenuHomePhabricator

Design a TwoColConflict workflow for (multi-part) ProofreadPage pages
Open, LowestPublic

Description

To do:

ProofreadPage uses special namespaces ("Page" and "Index", numbered 250 and 252), as well as special content models for these ("proofread-page" and "proofread-index"). A "page", for example, is made of multiple chunks of wikitext. How does conflict detection for these even work? If at all?

Event Timeline

I did a first test. This was easier than expected because I have the extension running on my dev system since T240858.

Screenshot from 2020-05-15 15-45-10.png (747×739 px, 157 KB)

  • The radio buttons at the bottom are broken. Fixed.
    • This is probably because the relevant JavaScript is not loaded.
  • Setting the radio buttons to something else is doomed to fail. The UI element is meant to change what's called the "pagequality" level between 0 and 4. But the moment we see the conflict, this number is already baked into the page's wikitext, visible as a <pagequality> element. Fixed.
    • It's even possible to have a conflict in this number.
  • Proofread pages have a header and a footer. Both appear in the conflict as part of the body, wrapped in <noinclude> tags. This is unexpected. The editor meant to edit these pages shows 3 separate textareas. The extension even does custom diffing to show these 3 elements in a nice way. The wikitext we see in the conflict is the internal format, as it is stored in the database.

As long as I don't touch any of the <noinclude> and <pagequality> tags I can resolve the conflict, and the resulting page will still work as expected. But the moment I mess with the tags, the deserializer in PageContentHandler::unserializeContentInWikitext() fails. Luckily it fails in a "nice" way. Nothing will get lost, but footer, header, as well as the quality rank might become part of the body, including then pointless <noinclude> tags. This must be manually fixed then.

But wait, here is when it gets crazy: The exact same happens in the original core conflict resolution screen!

In other words: Having TwoColConflict enabled in the Proofread namespaces doesn't make anything worse. Still it might create the impression TwoColConflict is broken.

I tried to quickly scan en.wikisource but couldn't find a manual page mentioning this or users complaining about this.

Possible ways forward:

  • Leave as is. TwoColConflict is certainly useful. These pages are just normal wikitext pages, except for the header and footer.
  • Disable TwoColConflict for these content models, or in these namespaces.
  • Disable editing for the first and last row in this namespace, making it impossible to accidentally edit the <noinclude> code.
  • Introduce special handling for the header, footer, and quality rank. Note this would create an awkward dependency between the two extensions.
  • Note that just hiding header, footer, and quality rank is not an option. Conflicts can be in all these elements, and need to be resolvable.

Change 596670 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/ProofreadPage@master] Remove broken radio buttons from conflict resolution interface

https://gerrit.wikimedia.org/r/596670

Change 596670 merged by jenkins-bot:
[mediawiki/extensions/ProofreadPage@master] Remove broken radio buttons from conflict resolution interface

https://gerrit.wikimedia.org/r/596670

This is presumably not an issue that comes up often because it is rare that edit conflicts occur in ProofreadPage's Page: namespace pages. Each such page represents one single page of a physical book, and the editing done there is either transcribing the original text or correcting the wikitext to match the book, both tasks that are by nature solitary. It is relatively rare for more than one person to work on an entire book (everything covered by an Index:), and for a single Page: page it is exceedingly rare. I've seen this happen maybe twice in half a decade.

It would be nice if TwoCol presented an easy opportunity to fix this state of affairs for the few times it does happen, but if given a choice I would personally probably hold out for Multi-Content Revisions and structured data to move the header, footer, and pagequality out of the wikitext and into separate MCR slots with suitable content models (and JS API!).

If TwoCol punts on this there won't be wailing and gnashing of teeth, is what I'm sayin’… :)

thiemowmde triaged this task as Lowest priority.May 20 2020, 3:05 PM

Amazingly helpful insight. Thanks a lot!

thiemowmde renamed this task from Test ProofreadPage extension together with TwoColConflict to Design a TwoColConflict workflow for (multi-part) ProofreadPage pages.May 9 2022, 7:27 AM