Page MenuHomePhabricator

Rewrite mw-warmup.js in Python
Closed, ResolvedPublic

Description

[…]

  • Timo to start rewriting mw-warmup in Python and add a "single host" command (e.g. instead of running against a whole cluster).

Event Timeline

Documenting this for posterity -- we agreed back in T269179 that when rewriting it in Python, it would be a good idea to move it directly into Spicerack's mediawiki module (or somewhere like that) rather than keep it in the puppet repo and shell out to it.

Unassigning for now. I had originally intended to take this on as a side-project to toy around more with Python. But both due to overall lack time and because I had forgotten about Spikerack, I think it might be better for SRE to take this on. Both so that you'd have familiarity with it from the get-go (instead of inheriting my quirks), and because I'm not sure it's the best use of staff time for me specifically to get familiar with Spikerack.

RLazarus claimed this task.
RLazarus added subscribers: Clement_Goubert, Volans, Joe.

I'm working on this.

@Volans @Joe We talked about building this into Spicerack, but there's one complication: as-is, it runs on the maintenance host, so the requests are all within the same DC. If we moved it into Spicerack, it could run on either cumin host -- so if dc_to is codfw but the cookbook happens to be running on cumin1001, we'll pay the cross-DC latency penalty on each roundtrip, which might (untested) add up to a significant slowdown over an entire warmup sequence. That isn't a showstopper (warmup happens before the critical period, and multi-dc means we don't always have to do it, even) but it sounds annoying.

Some options:

  • Keep the code in Puppet and run_sync it on the dc_to maintenance host, as we currently do -- just written in Python instead of JS.
  • Put it in Spicerack and document that the cookbook (and therefore the entire switchover) should be run from dc_to. (Or dc_from, if it's a live_test.) The warmup can start with a warning like This might be slow, you should run from the other DC, continue anyway? y/n
  • (from @Clement_Goubert) Put it in Spicerack, and if the cookbook is running in the wrong DC, use run_sync to execute it on the other cumin host. That means a cookbook effectively shelling out to run cookbook on the other host, which feels weird, but it should work.

I'm inclined to start out by doing the first thing -- rewrite warmup.js into warmup.py to start with, and keep it on the maintenance host, in part to do a side-by-side comparison and make sure nothing gets worse in the port. After that, it's a relatively minor change to move the code into the Spicerack repo, and we can decide what to do about the cross-DC issue. Thoughts?

I'm inclined to start out by doing the first thing -- rewrite warmup.js into warmup.py to start with, and keep it on the maintenance host, in part to do a side-by-side comparison and make sure nothing gets worse in the port. After that, it's a relatively minor change to move the code into the Spicerack repo, and we can decide what to do about the cross-DC issue. Thoughts?

I totally agree, start with just the conversion and then we can decide how to run it. It doesn't forcely need to be inside spicerack and if deemed more natural I think it's totally ok to have it just as a script on the maintenance hosts run by a cookbook via run_sync.

@Krinkle Are you aware of any current uses of warmup.js besides the DC switchover automation? Anywhere else I need to maintain compatibility, or adapt either humans or software to call the new script?

@Krinkle Are you aware of any current uses of warmup.js besides the DC switchover automation? Anywhere else I need to maintain compatibility, or adapt either humans or software to call the new script?

No, it's exclusively for DC switchover automation.

Change 890299 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/puppet@production] mediawiki-cache-warmup: Rewrite in Python

https://gerrit.wikimedia.org/r/890299

[not a blocker, not for next week]
Thinking a bit more about it I'd like to suggest an alternative approach for future usage, basically integrate it a bit more into the normal workflow of depool-pool of hosts and have it run automatically before re-pooling a server, if that's not overkill and/or httpbb is not already enough for that.
That would from one side ensure that the host works fine and from the other pre-load all the caches so the host will be not be put back into the rotation "cold".
I guess we can find a similar way to integrate that inside the k8s world too.

With this idea in mind I would instead deploy a script on all mediawiki hosts and then:

  • for the global warmup run in "all urls to all hosts" mode just run it via spicerack's remote from the cookbook on all hosts
  • for the global warmup in "spread" mode run if once from the cumin host (or any host) just telling the script to use the svc record instead of localhost.
  • have the pool command run it before re-pooling and do the same when pooling via confctl from spicerack/cookbooks

Thoughts?

Change 891520 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/cookbooks@master] sre.switchdc.mediawiki: use python warmup script

https://gerrit.wikimedia.org/r/891520

Sure, we could look at adding a warmup step to the server repool process. Historically we haven't worried about it, because the impact for one host is much smaller than when the entire cluster is cold, but it's worth looking at. I'd rather make this change in-place first though, and then it would be easy to also install something on each host. (We might choose to wait and do this after mw-on-k8s, but we can take a look at it for sure.)

For the "spread" mode, it sounds like the only change you're suggesting is to run it on the cumin host instead of mwmaint as we currently do? That tracks, especially since we were talking about maybe building this into spicerack anyway, but again I'd rather do the straight port on mwmaint, and then proceed with moving it around.

Change 890299 merged by RLazarus:

[operations/puppet@production] mediawiki-cache-warmup: Rewrite in Python

https://gerrit.wikimedia.org/r/890299

Sure, we could look at adding a warmup step to the server repool process. Historically we haven't worried about it, because the impact for one host is much smaller than when the entire cluster is cold, but it's worth looking at. I'd rather make this change in-place first though, and then it would be easy to also install something on each host. (We might choose to wait and do this after mw-on-k8s, but we can take a look at it for sure.)

Agree that it could make more sense to do that only in the k8s world. As for the 'one host', if during a deploy we do restart all php-fpm is 'all hosts', just one at a time :D

For the "spread" mode, it sounds like the only change you're suggesting is to run it on the cumin host instead of mwmaint as we currently do? That tracks, especially since we were talking about maybe building this into spicerack anyway, but again I'd rather do the straight port on mwmaint, and then proceed with moving it around.

I didn't want to push for using the cumin hosts instead of other hosts, that's not a big deal, I just wanted to acknowledge that the spread mode should be run from a central host. Happy to revisit later.

Change 891520 merged by jenkins-bot:

[operations/cookbooks@master] sre.switchdc.mediawiki: use python warmup script

https://gerrit.wikimedia.org/r/891520

Live-tested, codfw warmup takes around 3 minutes.