Page MenuHomePhabricator

Enrich "parent" HTML using diffs
Closed, ResolvedPublic

Description

During the HTML enrichment pipeline, we need to also enrich the event with the parent revision HTML content. To do that, we are doing the following:

  • Use the new schema designed for it
  • If the page change has a parent_rev_id, call the HTML endpoint to get the content.
  • Given the main HTML (html_canonical) and the parent HTML, compute a unified_diff and store it in the event.
  • Make sure the unified_diff can be used to build the parent HTML completely.

Event Timeline

Change #1244629 had a related patch set uploaded (by JavierMonton; author: JavierMonton):

[operations/mediawiki-config@master] component: mediawiki.page_html_content_change.dev0

https://gerrit.wikimedia.org/r/1244629

Change #1244629 merged by jenkins-bot:

[operations/mediawiki-config@master] component: mediawiki.page_html_content_change.dev0

https://gerrit.wikimedia.org/r/1244629

Mentioned in SAL (#wikimedia-operations) [2026-02-26T16:51:32Z] <javiermonton@deploy2002> Started scap sync-world: Backport for [[gerrit:1244629|component: mediawiki.page_html_content_change.dev0 (T418467)]]

Mentioned in SAL (#wikimedia-operations) [2026-02-26T16:53:35Z] <javiermonton@deploy2002> javiermonton: Backport for [[gerrit:1244629|component: mediawiki.page_html_content_change.dev0 (T418467)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2026-02-26T17:03:27Z] <javiermonton@deploy2002> Finished scap sync-world: Backport for [[gerrit:1244629|component: mediawiki.page_html_content_change.dev0 (T418467)]] (duration: 11m 55s)

Change #1245383 had a related patch set uploaded (by JavierMonton; author: JavierMonton):

[operations/deployment-charts@master] stream: mw-page-html-content-change-enrich-next

https://gerrit.wikimedia.org/r/1245383

Change #1245383 merged by jenkins-bot:

[operations/deployment-charts@master] stream: mw-page-html-content-change-enrich-next

https://gerrit.wikimedia.org/r/1245383

Change #1245410 had a related patch set uploaded (by JavierMonton; author: JavierMonton):

[operations/deployment-charts@master] stream: mediawiki.page_html_content_change

https://gerrit.wikimedia.org/r/1245410

Change #1245410 merged by jenkins-bot:

[operations/deployment-charts@master] stream: mediawiki.page_html_content_change

https://gerrit.wikimedia.org/r/1245410

Change #1247962 had a related patch set uploaded (by JavierMonton; author: JavierMonton):

[operations/deployment-charts@master] stream: mw-page-html-content-change-enrich-next

https://gerrit.wikimedia.org/r/1247962

Change #1247962 merged by JavierMonton:

[operations/deployment-charts@master] stream: mw-page-html-content-change-enrich-next

https://gerrit.wikimedia.org/r/1247962