Page MenuHomePhabricator

Deploy the new Cirrus Updater to update select wikis in cloudelastic
Closed, ResolvedPublic

Description

General plan:

  1. Make a choice on the set of wikis to update
    • Potentially: wikidatawiki, commonswiki, frwiki, itwiki, testwiki
  2. Enable page rerender events for the selected wikis. We may need to prune down the set of wikis, depending on operational limits
  3. Enable producer and consumer-cloudelastic in the production eqiad k8s cluster for selected wikis
  4. Disable Cirrus update process on those wikis (needs patch?)
  5. Monitor difference between eqiad and cloudelastic clusters via compare-clusters.py script in Cirrus

Deployment plan / sequence of events:

  1. page rerender topic is configured in T351503
  2. rerenders enabled for select wikis in https://gerrit.wikimedia.org/r/979155
  3. the flink-app chart is updated to use the latest service mesh templates from https://gerrit.wikimedia.org/r/c/operations/puppet/+/981309 so that the envoy sidecar container can access cloudelastic (done in https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/982823)
  4. producer and consumer-cloudelastic configured and deployed in https://gerrit.wikimedia.org/r/979147
    • At this point writes are flowing from the new pipeline into cloudelastic
  5. Disable writes from CirrusSearch to cloudelastic for select wikis in https://gerrit.wikimedia.org/r/979146

Event Timeline

Disable Cirrus update process on those wikis (needs patch?)

We can use $wgCirrusSearchDisableUpdate to do a full disable, but that's too much for this use case. Instead we should set $wgCirrusSearchWriteClusters to only include eqiad and codfw, but not cloudelastic, on the appropriate wikis

Change 979146 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/mediawiki-config@master] cirrus: Disable cloudelastic writes on selected wikis

https://gerrit.wikimedia.org/r/979146

Rough justification for the set of wikis:

  • testwiki - standard test deployment
  • itwiki,frwiki - representative of typical wikipedia edits
  • commonswiki, wikidatawiki - Special case wikis that might behave differently than the typical instances.

Change 979147 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/deployment-charts@master] cirrus updater: Expand test deployment to prod+cloudelastic

https://gerrit.wikimedia.org/r/979147

Change 979155 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/mediawiki-config@master] cirrus: Enable event bus bridge on more wikis

https://gerrit.wikimedia.org/r/979155

Ideally we want a size estimate, pre-deploy, of the refresh topic. Primarily so we can verify the accuracy of our estimate of the total size once refreshes are enabled on all wikis.

It looks like we are estimating the page rerender events to be at approximately the same rate as the existing cirrusSearchLinksUpdate jobs. Some related stats, estimated from one week (nov 27-dec 3) of kafka history. This reuses the prior estimate of 3 copies of the data at 0.6kB per event with 7 days retention. I added the row about removing commons since it is the largest of the selected wikis, giving an option to reduce the initial rollout.

7 day totalper second% of totalest topic size (Gb)
all wikis186M307100%319
selected wikis46M7624%78
minus commons24M4013%41

Change 979155 merged by jenkins-bot:

[operations/mediawiki-config@master] cirrus: Enable event bus bridge on more wikis

https://gerrit.wikimedia.org/r/979155

Mentioned in SAL (#wikimedia-operations) [2023-12-04T21:39:51Z] <ebernhardson@deploy2002> Started scap: Backport for [[gerrit:979155|cirrus: Enable event bus bridge on more wikis (T352335)]]

Mentioned in SAL (#wikimedia-operations) [2023-12-04T21:41:07Z] <ebernhardson@deploy2002> ebernhardson: Backport for [[gerrit:979155|cirrus: Enable event bus bridge on more wikis (T352335)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2023-12-04T21:49:15Z] <ebernhardson@deploy2002> Finished scap: Backport for [[gerrit:979155|cirrus: Enable event bus bridge on more wikis (T352335)]] (duration: 09m 23s)

Change 992974 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/mediawiki-config@master] cirrus: Disable cloudelastic writes to testwiki and mw.org

https://gerrit.wikimedia.org/r/992974

Change 979147 merged by jenkins-bot:

[operations/deployment-charts@master] cirrus updater: Expand test deployment to prod+cloudelastic

https://gerrit.wikimedia.org/r/979147

Change 992983 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/deployment-charts@master] flink-operator: Add cirrus-streaming-updater to prod watched namespaces

https://gerrit.wikimedia.org/r/992983

Change 992983 merged by jenkins-bot:

[operations/deployment-charts@master] flink-operator: Add cirrus-streaming-updater to prod watched namespaces

https://gerrit.wikimedia.org/r/992983

Change 993028 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/deployment-charts@master] cirrus-updater: Increase producer memory from 2g to 3g

https://gerrit.wikimedia.org/r/993028

Change 993032 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/deployment-charts@master] cirrus-updater: Normalize kafka configuration

https://gerrit.wikimedia.org/r/993032

Change 993032 merged by jenkins-bot:

[operations/deployment-charts@master] cirrus-updater: Normalize kafka configuration

https://gerrit.wikimedia.org/r/993032

Change 993045 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/deployment-charts@master] cirrus updater: Configure http routes for prod clusters

https://gerrit.wikimedia.org/r/993045

Change 993045 merged by jenkins-bot:

[operations/deployment-charts@master] cirrus updater: Configure http routes for prod clusters

https://gerrit.wikimedia.org/r/993045

Change 993028 abandoned by Ebernhardson:

[operations/deployment-charts@master] cirrus-updater: Increase producer memory from 2g to 3g

Reason:

unnecessary. For some reason kafka gives memory errors if it connects to a TLS endpoint when PLAINTEXT is configured.

https://gerrit.wikimedia.org/r/993028

Change 979146 abandoned by Ebernhardson:

[operations/mediawiki-config@master] cirrus: Disable cloudelastic writes on selected wikis

Reason:

https://gerrit.wikimedia.org/r/979146

Change 993754 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/deployment-charts@master] cirrus updater: Remove consumer-devnull service

https://gerrit.wikimedia.org/r/993754

Change 993755 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/deployment-charts@master] cirrus: Expand production deployment wikis

https://gerrit.wikimedia.org/r/993755

Change 993754 merged by jenkins-bot:

[operations/deployment-charts@master] cirrus updater: Remove consumer-devnull service

https://gerrit.wikimedia.org/r/993754

Change 993755 merged by jenkins-bot:

[operations/deployment-charts@master] cirrus: Expand production deployment wikis

https://gerrit.wikimedia.org/r/993755

Change 993788 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/deployment-charts@master] cirrus updater: Apply consumer throughput configuration

https://gerrit.wikimedia.org/r/993788

Change 993788 merged by jenkins-bot:

[operations/deployment-charts@master] cirrus updater: Apply consumer throughput configuration

https://gerrit.wikimedia.org/r/993788

Change 992974 merged by jenkins-bot:

[operations/mediawiki-config@master] cirrus: Disable cloudelastic writes to testwiki and mw.org

https://gerrit.wikimedia.org/r/992974

Mentioned in SAL (#wikimedia-operations) [2024-01-29T21:09:05Z] <catrope@deploy2002> Started scap: Backport for [[gerrit:992974|cirrus: Disable cloudelastic writes to testwiki and mw.org (T352335)]]

Mentioned in SAL (#wikimedia-operations) [2024-01-29T21:10:27Z] <catrope@deploy2002> ebernhardson and catrope: Backport for [[gerrit:992974|cirrus: Disable cloudelastic writes to testwiki and mw.org (T352335)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-01-29T21:17:46Z] <catrope@deploy2002> Finished scap: Backport for [[gerrit:992974|cirrus: Disable cloudelastic writes to testwiki and mw.org (T352335)]] (duration: 08m 40s)

The selected set of wikis has been enabled in production and are performing writes. Issues resulting from this deployment will be delt with in separate tickets.

Change 998559 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/mediawiki-config@master] cirrus: Re-enable writes to wikidata on cloudelastic

https://gerrit.wikimedia.org/r/998559

Change 998559 merged by jenkins-bot:

[operations/mediawiki-config@master] cirrus: Re-enable writes to wikidata on cloudelastic

https://gerrit.wikimedia.org/r/998559

Mentioned in SAL (#wikimedia-operations) [2024-02-07T22:07:32Z] <ebernhardson@deploy2002> Started scap: Backport for [[gerrit:998559|cirrus: Re-enable writes to wikidata on cloudelastic (T352335)]]

Mentioned in SAL (#wikimedia-operations) [2024-02-07T22:09:08Z] <ebernhardson@deploy2002> ebernhardson: Backport for [[gerrit:998559|cirrus: Re-enable writes to wikidata on cloudelastic (T352335)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-02-07T22:16:42Z] <ebernhardson@deploy2002> Finished scap: Backport for [[gerrit:998559|cirrus: Re-enable writes to wikidata on cloudelastic (T352335)]] (duration: 09m 10s)

Change 999962 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/mediawiki-config@master] cirrus: Re-enable cloudelastic writes for non-testwikis

https://gerrit.wikimedia.org/r/999962

Change 999962 abandoned by Ebernhardson:

[operations/mediawiki-config@master] cirrus: Re-enable cloudelastic writes for non-testwikis

Reason:

root cause fixed in I3d6282f6, this patch is not necessary.

https://gerrit.wikimedia.org/r/999962