Many template updates don't actually change the content of all pages they are used in. We should detect this and avoid storing a new render in RESTBase if nothing changed.
Description
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Invalid | None | T93751 RFC: Next steps for long-term revision storage -- space needs, storage hierarchies | |||
| Resolved | • GWicke | T93779 Only store a new render of Parsoid HTML / data-parsoid revision if the content actually changed after a template update | |||
| Open | None | T93715 [EPIC] Make Parsoid HTML output completely deterministic | |||
| Resolved | • marcoil | T63165 Parsoid's Cite extension sometimes produces different ids for the same <ref> source | |||
| Open | None | T206222 Make "about" attribute IDs deterministic |
Event Timeline
Comment Actions
This was deployed today. Somewhere between 50% and 75% of template updates turn out not to change the content at all, even while there are still other sources of non-determinism in Parsoid output (T93715). Once those are fixed / deployed (a major one is scheduled for next Monday) this rate should increase further.
Combined with T93777, this resulted in a drop of new revision entries stored from an average around 110/s to 37/s now.
Comment Actions
Also cc'ing @aaron, @ssastry and @tstarling, as I think these numbers are quite interesting. There is a lot of potential to optimize template updates, especially if we can figure out some conditions under which the output is not going to be affected.