Page MenuHomePhabricator

Prevent too many parsercache sections from being depooled
Closed, ResolvedPublic

Description

Currently, dbctl allows for all ParserCache sections to be depooled. That can...bring everything down. We need to add a check to prevent that from happening. At least two PC sections should be always pooled. Follow up to T383137: Allow full depool of pc sections in dbctl

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
DbConfig now requires config at constructionrepos/sre/conftool!60swfrenchwork/swfrench/dbctl-config-cleanupmain
Prepare release 5.0.0repos/sre/conftool!54swfrenchwork/swfrench/release-5.0.0main
Add minimum pooled parsercache sections checkrepos/sre/conftool!53swfrenchwork/swfrench/parsercache-min-pooled-checkmain
Release 4.2.0repos/sre/conftool!51cdanisrelease-4.2.0main
Add support for the "parsercache" section flavorrepos/sre/conftool!50swfrenchwork/swfrench/parsercache-flavormain
Customize query in GitLab

Event Timeline

Ladsgroup triaged this task as Medium priority.Jan 9 2025, 1:27 PM
Ladsgroup moved this task from Triage to In progress on the DBA board.

Having a mechanism in dbctl to prevent zero-weighing "too many" parsercache sections (T383137#10440507) sounds like a great addition.

It does raise some interesting questions about how to achieve it, though, since currently:

  1. dbctl doesn't have any validation checks that operate across sections of a given flavor.
  2. parsercache sections are simply treated as "external" flavor (i.e., alongside "real" external store sections).

Naively, we'll need to address both of these - e.g., imbue parsercache sections with a distinct flavor, and teach dbctl to support intra-flavor cross-section validation invariants.

@Marostegui, @Ladsgroup - Would either of you object if, as part of this work, we switch parsercache sections over to using "parsercache" as their enum-ish flavor attribute in the dbctl section object?

IMO, this would be cleaner than other options like string matching on section name in dbctl or imbuing the section object with some other discriminator.

They would still appear in the externalLoads attribute in the config object (i.e., this would be transparent to mediawiki-config), but it would open the door to moving them out eventually if desired (which would perhaps make sense given that they should never appear in the externalLoads of the wgLBFactoryConf).

If this sounds reasonable, I'm happy to give this a try.

@Scott_French I assume this change would be just a "noop" in the way we operate/deploy/depool/pool right?

@Marostegui - Exactly, yes: The only way it would change your workflow would be on creation of a new section. Just like now, when conftool-sync creates a new section object, it will default the flavor to "regular". The difference is that, when editing that object to initially populate it, you would set it to "parsercache" instead of "external".

Thanks for clarifying @Scott_French - fine by me if Amir doesn't see any potential MW issues there.

Since it's transparent to mediawiki config side, I'm totally happy with that!

Great, thank you both! I'll post the first set of patches for this shortly.

While we could structure this to require only a single conftool release, IMO it's cleaner / safer to do this in two steps:

  1. Update the JSON schema to allow the "parsercache" flavor and deploy a new conftool that supports it. Then, update flavor on the existing parsercache sections, which is a noop (i.e., will result in no diff in the generated config).
  2. Deploy a new conftool that supports (and enables) the safety check. Since #1 is already done, the safety check is immediately satisfied.

The other option is to land these in a single conftool release, where we temporarily disable the safety check (e.g., set the minimum pooled sections count to zero) until flavor is updated. That's totally doable, but adds a bit more to think / worry about.

@CDanis FYI

Change #1110880 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:conftool: allow the parsercache section flavor

https://gerrit.wikimedia.org/r/1110880

Change #1110880 merged by Scott French:

[operations/puppet@production] P:conftool: allow the parsercache section flavor

https://gerrit.wikimedia.org/r/1110880

Alright, both the new conftool release and the JSON schema change to allow the parsercache flavor are live as of ~ 20:00 UTC.

That means it should be fine to go ahead and switch parsercache sections over to the new flavor, but it's probably prudent to wait a bit just in case any unexpected issues pop up with the new conftool release.

While that's unlikely given the content of the new release, it would put us in an awkward state if we need to roll back (i.e., dbctl would no longer know how to map the sections to the appropriate config property).

Alright, both the new conftool release and the JSON schema change to allow the parsercache flavor are live as of ~ 20:00 UTC.

That means it should be fine to go ahead and switch parsercache sections over to the new flavor, but it's probably prudent to wait a bit just in case any unexpected issues pop up with the new conftool release.

While that's unlikely given the content of the new release, it would put us in an awkward state if we need to roll back (i.e., dbctl would no longer know how to map the sections to the appropriate config property).

Thank you for this work. I'd be happy to coordinate with you to switch them to parsercache, let's try to aim for next week? So we have a couple of days of dbctl usage to make sure it is all good?

Thanks, @Marostegui - Aiming for early next week sounds good if that's preferable (FYI, I'll be out Monday).

Given the specific content of this conftool release, and the amount of use the tool has seen since the update yesterday, it's probably unlikely that an issue warranting a rollback will be discovered. That said, better safe than sorry :)

In the meantime, I'll post a patch for the validation check itself later today.

Sounds good! Thanks - I suggested next week cause:

  1. Tomorrow I am not sure I will have the time to double check after the deployment
  2. Friday...it is Friday.

If you feel this is unlikely to break, maybe you can go ahead on Thursday if @Ladsgroup is available.

@Scott_French just to let you know: I've deployed a new section of parsercache (pc6 - T383234) and next week I will do pc7 (T383235). pc6 is for now external and if we do this deployment, I will make sure pc7 is deployed as parsercache

@Marostegui - Great, thank you for the heads-up! I'll keep you posted on when the flavor switch happens.

I'll post a MR for a 5.0.0 release next week after the flavor switch happens.

As noted in the TODO introduced in https://gitlab.wikimedia.org/repos/sre/conftool/-/merge_requests/53, a follow-up patch to spicerack will be needed in order for the conftool config to be wired into DbConfig construction. That should happen after the 5.0.0 release to production and PyPI.

The plan for switching to the new flavor is relatively straightforward:

  1. Coordinate with data-persistence to confirm no ongoing work that may conflict.
  2. Choose a single site / pc section and switch flavor over to "parsercache"
  3. Verify that dbctl config diff produces nothing (i.e., the section change was a noop, as expected).
  4. Expand to the remaining pc sections.

Given the hour and #1, I'd propose we do this tomorrow (Wednesday).

Mentioned in SAL (#wikimedia-operations) [2025-01-22T15:11:51Z] <swfrench-wmf> switched dbctl pc section objects to flavor "parsercache" - T383324

Short end-of-week update:

Since no issues have shaken out following the flavor switch on Wednesday, I think we're good to move ahead with the 5.0.0 release next week, which will enable the feature.

I have a one-liner patch for spicerack.dbctl to resolve this TODO, which I'll post once 5.0.0 is released to PyPI.

Alright, once we address some unrelated issues that are blocking builds, we should have debian packages to deploy.

@Marostegui - Do you have any preferences / constraints on timing for when we release a new conftool that enables the safety check? Presumably this would be some time next week given the day :)

Edit: Issues are fixed and packages (5.0.1) are ready for inclusion on apt1002 when we're ready to move ahead.

Alright, once we address some unrelated issues that are blocking builds, we should have debian packages to deploy.

@Marostegui - Do you have any preferences / constraints on timing for when we release a new conftool that enables the safety check? Presumably this would be some time next week given the day :)

Not really, any time would probably work for me :)

Mentioned in SAL (#wikimedia-operations) [2025-02-05T15:51:19Z] <swfrench-wmf> finished deploying conftool 5.0.1-1 - T383324

This is now live, and I've updated https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting#Depooling_a_parsercache_host accordingly.

Remaining items:

  • 1. We need to release conftool 5.0.1 to PyPI.
  • 2. I need to update spicerack's instantiation of DbConfig to wire in the config (blocked on #1).
  • 3. I need to resolve this TODO by making config non-optional (blocked on release of #2).

Change #1120648 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/software/spicerack@master] dbctl: pass DbCtlConfiguration to DbConfig

https://gerrit.wikimedia.org/r/1120648

Change #1120648 merged by jenkins-bot:

[operations/software/spicerack@master] dbctl: pass DbCtlConfiguration to DbConfig

https://gerrit.wikimedia.org/r/1120648

Alright, I believe that should be all the remaining items tracked here.

A conftool release should be safe to move forward at any time, though it may make sense to wait until there's clarity around the DbConfig structured diff API (T383760). In any case, I'll track that separately.

Thanks, all!