Page MenuHomePhabricator

Validate pybal config in CI
Open, Needs TriagePublic

Description

Creating this ticket based on an IRC discussion between myself, @ssingh and @CDanis from earlier this year.

We (Search Platform SRE) have inadvertently created invalid pybal configuration via a Puppet patch at least twice. Pybal configuration is protected by a schema, so this does not break Pybal.

However, invalid Pybal config does set off errors such as ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_search-https.toml . Additionally, it requires rolling back Puppet patches and other operationally expensive actions.

Since we already have a schema, it makes sense to use the schema in CI so we can catch these errors before they are live in production.

Creating this ticket to:

  • Add pybal schema validation to CI
  • Verify operation

Event Timeline

bking renamed this task from Consider validating pybal config in CI to Validate pybal config in CI.Sep 23 2025, 1:57 PM
bking added a project: Traffic.
bking updated the task description. (Show Details)
bking added subscribers: ssingh, CDanis.

Hi @bking. Thanks for the task! Do note that regardless of the task above, PyBal will soon be deprecated in favour of Liberica, once T368544 is resolved, which should happen within a few quarters.

I am just mentioning this so that we can factor the above in before doing any more work or investing engineering time on PyBal since the plan is to deprecate it soon. (Only eqiad and codfw have PyBal running and while that's significant, the only blocker is k8s stuff, otherwise we would have switched to Liberica there. We are intentionally not rolling out both Liberica and PyBal to keep things uniform.)

Thanks @ssingh , that's completely understandable. If PyBal is going away, it's probably not worth the effort to fix it.

But I'm wondering if Liberica has the same issue? In other words, is it possible to create an invalid Liberica config via Puppet patches, that would be caught by the ConfdResourceFailed alerts? If so, then the task is still valid, it just needs to change scope. Let me know what you think.

Thanks @ssingh , that's completely understandable. If PyBal is going away, it's probably not worth the effort to fix it.

But I'm wondering if Liberica has the same issue? In other words, is it possible to create an invalid Liberica config via Puppet patches, that would be caught by the ConfdResourceFailed alerts? If so, then the task is still valid, it just needs to change scope. Let me know what you think.

I think that depends on which Liberica config -- Liberica will still refer to the service definitions under hieradata/common/service.yaml and read those (see modules/liberica/functions/service_from_wmflib.pp) and then there is a CI check that tries to compile the catalog (see modules/profile/spec/classes/profile_liberica_spec.rb), so it certainly has more guard rails than PyBal.

Can you remind me what the invalid configuration was in this case? That way we can compare it against the Liberica checks to see if they would have been caught.