Page MenuHomePhabricator

Integrate Saneitizer with SUP
Closed, ResolvedPublic13 Estimated Story Points

Description

General ideas:

  • A custom flink source, embedded in the consumer, that handles the id generation phase (roughly, SaneitizeJobs.php). This should generate id's on timers to slowly produce all ids over a configured time period
    • list of wikis should be sourced from consumer wikis parameter. When no wiki filter is in place source from sitematrix on metawiki. Or noc.wikimedia.org
    • max page id should be queried from the wiki somehow. Alternatively could ask elasticsearch, but that is less of a guarantee.
  • Expose the cirrussearch checker to the mediawiki api. Flink should call this api with the list of id's and receive the results of checking
  • Potentially generate a side output of found errors, for future debugging
  • Generate update events that flow into the consumer fetch phase that fix up found errors.

Details

TitleReferenceAuthorSource BranchDest Branch
Revert workaround for UpdateEvent.changeType = PAGE_RERENDER_UPSERTrepos/search-platform/cirrus-streaming-updater!123pfischerrevert-upsert-workaroundmain
Pause saneitizer without changing graph shaperepos/search-platform/cirrus-streaming-updater!122ebernhardsonwork/ebernhardson/pause-saneitizermain
Saneitizer: Better handling of bad responsesrepos/search-platform/cirrus-streaming-updater!120ebernhardsonwork/ebernhardson/saneitizer-problemsmain
Encode failed saneitizer updates with UpdateEvent.changeType = PAGE_RERENDER_UPSERTrepos/search-platform/cirrus-streaming-updater!119pfischerencode-page-rerender-upsert-fetch-failuremain
Saneitizer: Assign timestamps when creating eventsrepos/search-platform/cirrus-streaming-updater!117ebernhardsonwork/ebernhardson/saneitizer-fixupmain
Match sanity check response handling to real responsesrepos/search-platform/cirrus-streaming-updater!116ebernhardsonwork/ebernhardson/saneitizer-fixupmain
Implement Saneitizerrepos/search-platform/cirrus-streaming-updater!110ebernhardsonwork/ebernhardson/saneitizer-sourcemain
Extract a blocking mw http client from CirrusNamespaceIndexMaprepos/search-platform/cirrus-streaming-updater!109ebernhardsonwork/ebernhardson/mw-http-clientmain
Customize query in GitLab

Related Objects

Event Timeline

Change 1006997 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] Expose sanity checker over mw api

https://gerrit.wikimedia.org/r/1006997

Gehel set the point value for this task to 13.

Change 1012450 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] Provide max page id in api query site info general info

https://gerrit.wikimedia.org/r/1012450

Change 1012450 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Provide max page id in api query site info general info

https://gerrit.wikimedia.org/r/1012450

Change #1006997 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Expose sanity checker over mw api

https://gerrit.wikimedia.org/r/1006997

Change #1019833 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/deployment-charts@master] cirrus: Enable saneitizer on consumer-cloudelastic

https://gerrit.wikimedia.org/r/1019833

Change #1019833 merged by jenkins-bot:

[operations/deployment-charts@master] cirrus: Enable saneitizer on consumer-cloudelastic

https://gerrit.wikimedia.org/r/1019833

Change #1019935 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/deployment-charts@master] cirrus: Update container for saneitizer

https://gerrit.wikimedia.org/r/1019935

Change #1019935 merged by jenkins-bot:

[operations/deployment-charts@master] cirrus: Update container for saneitizer

https://gerrit.wikimedia.org/r/1019935

pfischer opened https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/119

Encode failed saneitizer updates with UpdateEvent.changeType = PAGE_RERENDER_UPSERT

Change #1020259 had a related patch set uploaded (by Peter Fischer; author: Peter Fischer):

[schemas/event/primary@master] cirrussearch/update_pipeline/update add change_type.PAGE_RERENDER_UPSERT enum constant

https://gerrit.wikimedia.org/r/1020259

ebernhardson merged https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/119

Encode failed saneitizer updates with UpdateEvent.changeType = PAGE_RERENDER_UPSERT

Change #1020259 merged by jenkins-bot:

[schemas/event/primary@master] cirrussearch/update_pipeline/update add change_type.PAGE_RERENDER_UPSERT enum constant

https://gerrit.wikimedia.org/r/1020259

Iniital deployment has been a bit rocky, in particular saneitizer is visiting pages with error states we haven't seen in normal operation yet. Overall this is probably good, we would have run into pages with these error states eventually. Saneitizer is simply speeding that process up. The pipeline has been running for a couple hours now without issues,. If it's still running without restarts by tomorrow we can probably consider the initial deployment complete.

One potential improvement we talked about, the initial method of configuring the saneitizer adds new pieces to the flink execution graph. This means you have to play around with some dangerous options to pause saneitization, losing the current saneitization state in the process. We should update the operation of the flag to toggle saneitization so that it still connects to the graph, but never emits any events or state changes when disabled. The general idea is that the shape of the graph should not change due to configuration changes, as graph shape changes require careful deployments.

Gehel claimed this task.