Page MenuHomePhabricator

Graceful degradation of generated HTML in the face of template-wrapping, section-wrapping, or other edit-client-support failures
Open, LowPublic

Description

Currently we have a bunch of bugs where template wrapping or section wrapping fails (either generating some potentially-uneditable output or crashing the page) or possibly some other scenario where Parsoid cannot handle wrt supporting edits. Examples: T180534: Failed template encapsulation, T184779: Clean up a rare edge case in section-wrapping and extension-content interaction, T240642: Production crashers in WrapSections code because of missing properties ($dsr, $pi, $parts).

As we move towards Parsoid serving all read views, we should figure out a way of gracefully recovering from some (if not all) of these errors so even if the page is not editable, it should still render properly, i.e. we should not use HTTP 500 / 400 / 503 error codes as our VE-based-edit-prevention mechanism. We can instead recover from those error scenarios and mark an appropriate or the entire page uneditable via appropriate annotations. This can either be mw:Uneditable or mw:Error with additional information in data-mw that also tells VE how to handle the error wrt display and editing functionality.

This lets us move ahead with Parsoid read views without blocking on fixing or handling all the bugs / edge cases we have.

T180534 might be a good first test case for working through this mechanism.


Related: See https://gerrit.wikimedia.org/r/#/c/415992/

... a few things to happen. (a) protect the page from being edited via some error markup on the body tag (b) still generate HTML that enables the page to be read.

(a) is resolved as a requirement -- needs to be implemented.
(b) requires us to skip any dom passes that is editing-support only. It also requires adding defensive code in rendering-related passes that might inspect transclusion state.

Event Timeline

Arlolra triaged this task as Medium priority.Apr 6 2018, 3:35 PM

We also need something comprehensive for the missing data-mw parts that results, for the subsequent dom passes that might assume it's there. See T196799 and T187386

ssastry renamed this task from Protect pages from being edited when template wrapping fails to Graceful degradation of generated HTML in the face of template-wrapping, section-wrapping, or other edit-client-support features.May 12 2020, 6:00 PM
ssastry updated the task description. (Show Details)
ssastry added a project: Parsoid-Rendering.
ssastry moved this task from Feature requests to Missing Functionality on the Parsoid board.
ssastry added a subscriber: ssastry.

We should start by gathering some data on how often these failures occur.

MBinder_WMF lowered the priority of this task from Medium to Low.Jul 9 2020, 6:27 PM

Thoughts from meeting today:

  1. As @Arlolra suggests above, one step would be to automate a weekly list of "top crashers by frequency" as well as perhaps distinguishing by source (crashers caused by a read view or edit attempt that were directly visible to some user -vs- crashers caused by change propagation that perhaps were never visible to a human). Then we can focus and prioritize work, in addition to quantifying how far down the long tail we actually are.
  2. It was suggested that the root cause of most of our crashers is template wrapping. We should try to substantiate that claim. If it is true, then we might do a deep dive on template wrapping, perhaps being quicker to remove DSR information from unusual cases instead of "trying hard" to come up with something that we can't guarantee is correct. Another view of this: perhaps we should write a "DSR verifier" pass, that runs at intervals in the pipeline and enforces some well-formed-ness guarantees on DSR information, again, marking an entire subtree uneditable and clearing bad DSR if those guarantees don't hold. "Clearing DSR information" is one version of what "error recovery" might look like. (We still need to guarantee that all parts of our codebase that handle DSR information can handle the case where it is missing, but that's probably amenable to static analysis or even a git grep dsr.)
  3. We need to be careful not to obscure the root cause. It is probably easier to compound the problem by overly-simplistic error-recovery -- in some sense you might say that the error-recovery of the DSR passes is what is causing these problems to begin with. Consider also for example UTF-8 errors that are caused by bad DSR information: if we ignore the UTF-8 error we end up emitting a document with bad UTF-8, which can cause crashes further down the line. If we instead remove or truncate the string that would have contained the bad UTF-8, then we can cause selser errors when that string is substituted back into the document, etc. We need to ensure that whatever error recovery scheme we choose is actually causing better outcomes for the user, not just replacing the clear crash with more obscure and unpredictable corruption. I think a crash is still preferred to corruption. Replacing subtrees with "<p>Parser error</p>" as the legacy parser often does is fine *so long as we are super careful* to ensure that <p>Parser Error</p> never makes it through selser to corrupt the original document.
ssastry renamed this task from Graceful degradation of generated HTML in the face of template-wrapping, section-wrapping, or other edit-client-support features to Graceful degradation of generated HTML in the face of template-wrapping, section-wrapping, or other edit-client-support failures.Feb 13 2021, 1:54 AM