Parsoid element ids are not stable at this point. A template re-expansion and possibly even a non-deterministic processing order can currently change the ID assignment between different renders of the same article revision.
Currently we only require the oldid to be passed in for html2wt conversion. Given the oldid, we'll then send html and data-parsoid of the latest render to Parsoid. However, Parsoid assumes that the received data-parsoid actually corresponds to the edited HTML's render. If ID assignment changed, this can lead to major corruptions.
I see two main solutions to this issue:
- Parsoid: T93715: [EPIC] Make Parsoid HTML output completely deterministic. It should be possible to do this for the vast majority of cases, but there will still be some extreme cases of templates restructuring the page that could change the ID assignment.
- Require clients to pass in the original etag (tid) in the html2wt request, possibly in a If-Match header or the path and use that to retrieve the exact render this edit is based on. This is safe, but more complex for clients and impinges on RESTBase's ability to garbage-collect older renders in a timely manner (see T94196).
- T93086: Enforce tid equality of html and data-parsoid passed to Parsoid on html2wt serialization - much less common race condition