Page MenuHomePhabricator

Introduce known-client identity objects and integrate with requestctl
Open, MediumPublic

Description

At a high level, this will proceed in two phases: identification of known-client requests in HAProxy and rate-limiting of identified requests in Varnish, each described in detail below.

Status: As of 2025-11-25 identification is live and enabled in cache-text and cache-upload globally and rate-limiting is ready to be enabled in cache-text (cache-upload is not yet supported, pending further discussion on an appropriate default limit).


Phase 1: Identification

In this phase, we introduce the known-client identity object and the ability to, for requests that identify as the configured User-Agent(s):

  • apply an X-Provenance label-value pair and set X-Trusted-Request to "B" for requests that originate from IP blocks authorized to do so
  • deny (403) those that do not originate from those IP blocks

This includes both the basic CRUD support in HIDDENPARMA for managing these identities, as well as translation to haproxy DSL fragments and integration with our haproxy config confd template.

Once this is complete, requests from known clients can be identified as such (via X-Provenance and X-Trusted-Request) and requests attempting to impersonate them can be denied, but no specialized rate limits are applied - i.e., they're treated like any other request.

Logging these classification or denial decisions (or would-be decisions, when the functionality is disabled) will be a key part of gaining confidence that we're ready to move on to the next phase.

There should be a clear, deterministic outcome in either of the following cases of multiple-match:

  • A request that matches the identification or impersonation-detection behaviors of multiple known-client identities (e.g., first match wins, with identities ordered lexicographically).
  • A request that is classified as impersonating one client, but successfully identifies as another (e.g., an identified request is never denied).

One could imagine the latter scenario occurring if there exist multiple similarly named clients with distinct source IPs owned by the same organization (e.g., if overlapping User-Agent patterns are incorrectly configured).

TODO:

  • Priorities: Currently, known client rules are ordered lexicographically by name - i.e., there's no way for the user to control the order they're applied. This is different from action-type entities which have a priority that determines sort order. While I anticipate that the behavior described above (denial is deferred until all rules have been applied, and is superseded by identification) will obviate certain use cases for fine control over rule ordering, it probably won't cover everything. Thus, we should consider introducing a similar priority mechanism, with name functioning only as a tie-breaker.
  • Selectors: In their current form, known-client objects technically support selectors (e.g., site scopes), but we do not support them in the UI. The original plan to was to facilitate incremental rollout of potentially risky rules. We should decide whether that actually makes sense in practice, and if so, make it so.
  • Superset: The logging-mode tags (yellow) on known-client objects should become superset links for the respective x-requestctl action.

Phase 2: Rate limiting

In this phase, we introduce the ability to side-step "default" requestctl rules for requests originating from identified known clients, enrolling them in a different set of defaults and supporting per-client (and cache cluster) overrides.

There are a couple of details to sort out here while we're working on the first phase, including selection of the known-client default rate limits and deciding which layers of request processing in the CDN support per-client rate limit overrides (i.e., all or only a subset of haproxy, varnish hits, varnish misses).

Regardless, it is likely at this stage that per-client rate limit overrides will be configured via "normal" action and haproxy_action objects, rather than being tightly integrated with the known-client identity object itself, with the scope:identified selector set.

Update (2025-11-25) - The proposal from 2025-10-22 below is now implemented, and in the process of being enabled in cache-text as part of T406545: FY 25/26 WE 5.4.5: Enforce global rate-limits.

Update (2025-10-22) - After additional discussion, things have evolved a bit:

  • Per-client rate limit overrides must support limits that are less restrictive than the default from day 1. In short, that's impossible to support with "normal" scope:identified action objects as they exist today - i.e., admission by a given matching rule has no way to prevent rejection by a subsequent one (e.g., the default limit).
  • Rather than trying to add that kind of "accept on admit" option to actions in general (due to complexity and usability concerns), the simplest / fastest solution points back toward managing the limits directly in the known_client object, but doing so in a way that avoids introducing a new VCL rendering path for known_clients in HIDDENPARMA (since that too carries some complexity, which we may in fact throw away if actions become more powerful later on).
  • The absolute simplest option (h/t to @Joe) would be to side-step VCL rendering in HIDDENPARMA, and instead produce a simple VCL if / elseif chain directly from (enabled) known_client objects in a confd template. The closest precedent for something like this is ipblock-to-HAProxy-map rendering, which is similarly out-of-band from DSL rendering.
    • Like ipblocks, renaming carries some pitfalls - i.e., the delay between a known_client rename and subsequent commit could result in requests from that client transiently falling through to the default limit or have no limit at all, depending on how the VCL is structured (i.e., while HAProxy identification rules and match expressions on Varnish-side rate limits are out of sync).
    • On balance, that's strictly more graceful than the ipblock rename case, where a delayed commit can result in requests incorrectly classified as impersonation.

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/puppetproduction+1 -1
operations/puppetproduction+64 -0
operations/puppetproduction+170 -138
operations/puppetproduction+64 -0
operations/puppetproduction+2 -0
operations/puppetproduction+0 -1
operations/puppetproduction+1 -0
operations/puppetproduction+0 -1
operations/puppetproduction+5 -0
operations/puppetproduction+1 -0
operations/puppetproduction+6 -1
operations/puppetproduction+4 -0
operations/puppetproduction+19 -5
operations/puppetproduction+29 -16
operations/puppetproduction+33 -25
operations/puppetproduction+3 -0
operations/puppetproduction+113 -63
Show related patches Customize query in gerrit
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Imbue known_client objects with a priorityrepos/sre/hiddenparma!129swfrenchwork/swfrench/known-clients/identification/prioritymain
Include superset URLs for known-client objectsrepos/sre/hiddenparma!128swfrenchwork/swfrench/known-clients/identification/supersetmain
Introduce per-cluster rate limits to `known_client`repos/sre/hiddenparma!122swfrenchwork/swfrench/known-clients/rate-limiting/add-fieldsmain
Use request variables for internal headers in known-client DSLrepos/sre/hiddenparma!120swfrenchwork/swfrench/known-clients/identification/use-variables-in-dslmain
Introduce output DSL rendering for known_client objectsrepos/sre/hiddenparma!115swfrenchwork/swfrench/known-clients/identification/output-dslmain
Introduce DSL rendering for known_client objectsrepos/sre/hiddenparma!112swfrenchwork/swfrench/known-clients/identification/entity-dslmain
Refactor acl naming and add entity-type annotationrepos/sre/hiddenparma!108swfrenchwork/swfrench/known-clients/identification/entity-aclnamemain
Introduce the known_client object and basic admin capabilitiesrepos/sre/hiddenparma!104swfrenchwork/swfrench/known-clients/identification/adminmain
Customize query in GitLab

Event Timeline

Scott_French triaged this task as Medium priority.
Scott_French renamed this task from Introduce known-client-identity objects and integrate with requestctl to Introduce known-client identity objects and integrate with requestctl.Sep 10 2025, 12:16 AM
Scott_French updated the task description. (Show Details)

Change #1192616 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:conftool::requestctl_client: update requestctl_cli.original.py

https://gerrit.wikimedia.org/r/1192616

Change #1192620 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:conftool::hiddenparma: enable known_client_expression_validation

https://gerrit.wikimedia.org/r/1192620

Change #1192616 merged by Scott French:

[operations/puppet@production] P:conftool::requestctl_client: update requestctl_cli.original.py

https://gerrit.wikimedia.org/r/1192616

Change #1192620 merged by Scott French:

[operations/puppet@production] P:conftool::hiddenparma: enable known_client_expression_validation

https://gerrit.wikimedia.org/r/1192620

Change #1193275 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:cache::haproxy: start preparing for known-client DSL

https://gerrit.wikimedia.org/r/1193275

swfrench opened https://gitlab.wikimedia.org/repos/sre/hiddenparma/-/merge_requests/120

Draft: Use request variables for internal headers in known-client DSL

Change #1198182 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:cache::varish::frontend: render known-client rate limit VCL

https://gerrit.wikimedia.org/r/1198182

Change #1198183 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:cache::varnish::frontend: wire known-client rate limits into Varnish

https://gerrit.wikimedia.org/r/1198183

Change #1193276 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:cache::haproxy: move x_requestctl setup into listen section

https://gerrit.wikimedia.org/r/1193276

Change #1193275 merged by Scott French:

[operations/puppet@production] P:cache::haproxy: start preparing for known-client DSL

https://gerrit.wikimedia.org/r/1193275

Change #1196543 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:cache::haproxy: introduce known-client DSL fragment

https://gerrit.wikimedia.org/r/1196543

Change #1193276 merged by Scott French:

[operations/puppet@production] P:cache::haproxy: move x_requestctl setup into listen section

https://gerrit.wikimedia.org/r/1193276

Change #1196543 merged by Scott French:

[operations/puppet@production] P:cache::haproxy: introduce known-client DSL fragment

https://gerrit.wikimedia.org/r/1196543

Change #1196544 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] hieradata: pilot use_etcd_known_client_ident on cp2041

https://gerrit.wikimedia.org/r/1196544

Change #1200397 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] haproxy: add known-client DSL fixture in tests

https://gerrit.wikimedia.org/r/1200397

Change #1200397 merged by Scott French:

[operations/puppet@production] haproxy: add known-client DSL fixture in tests

https://gerrit.wikimedia.org/r/1200397

Change #1196544 merged by Scott French:

[operations/puppet@production] hieradata: pilot use_etcd_known_client_ident on cp2041

https://gerrit.wikimedia.org/r/1196544

Change #1201811 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:cache::haproxy: update confd watch_keys for known-client DSL

https://gerrit.wikimedia.org/r/1201811

Change #1201811 merged by Scott French:

[operations/puppet@production] P:cache::haproxy: update confd watch_keys for known-client DSL

https://gerrit.wikimedia.org/r/1201811

Change #1201824 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] hieradata: end use_etcd_known_client_ident pilot on cp2041

https://gerrit.wikimedia.org/r/1201824

Change #1201824 merged by Scott French:

[operations/puppet@production] hieradata: end use_etcd_known_client_ident pilot on cp2041

https://gerrit.wikimedia.org/r/1201824

Alright, progress!

Aside from the oversight addressed in https://gerrit.wikimedia.org/r/1201811 that initially prevented confd from picking up known-client DSL updates, the pilot went quite well.

To recap, the goal was to test known-client functionality on a single cache-text host, with the specific host (cp2041) chosen to coincide with the site where the vast majority of traffic arrives originating from the specific client targeted in the test.


Good news -

Identification works as expected in logging (x-analytics update only) and enabled (x-provenance, x-trusted-request update) mode - e.g., in the latter case, I was able to use a log-only Varnish action to ensure visibility of the x-provenance update upstream (note: the fact that I was able to do that is an artifact of how varnish actions are not yet skipped for x-trusted-request "B").

Similarly, impersonation detection / denial does not produce spurious detection / denial, though I was unable to observe an "organic" denial during my test (and ran out of time before I could figure out a straightforward way to trigger a denial externally on the specific pilot host).

Bad news -

The one issue that came up is the following:

For x-trusted-request "B" traffic, we skip the haproxy-action DSL backend and proceed directly to upstream varnish. Currently, that also means we do not populate the x-requestctl header from txn.x_requestctl (i.e., it has its "empty" default value).

That means that varnish starts from a blank slate when populating x-requestctl, and thus we lose any actions previously logged at haproxy when varnish returns the x-analytics response header extended with actions that were logged there (which at that point we take to be authoritative when updating x-analytics in haproxy).

But, Scott, you might say, why does that matter if we've skipped the rules that might have logged actions at haproxy? What's lost?

Well, this means that the action of positively identifying a request is no longer logged - i.e., it certainly happens (in that, x-provenance etc. are updated), but it's invisible. That's bad for debuggability.

Further, imagine a scenario where we have a known-client in identify-enabled mode, and wish to also enable impersonation denial. To do that safely, we enable log-only mode for the latter to assess the potential impact. Well, in that case, you can't - i.e., the logged "would have been denied" event is lost.

In any case, this is a minor speed bump, and it should be fine to populate x-requestctl in the "proceed directly to upstream varnish" case as well. I just need to check a couple of things to confirm that.

Change #1201844 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:cache::haproxy: ensure x-requestctl is updated

https://gerrit.wikimedia.org/r/1201844

Change #1201844 merged by Scott French:

[operations/puppet@production] P:cache::haproxy: ensure x-requestctl is updated

https://gerrit.wikimedia.org/r/1201844

Change #1202208 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] hieradata: use_etcd_known_client_ident pilot on cp2041 (#2)

https://gerrit.wikimedia.org/r/1202208

Change #1202208 merged by Scott French:

[operations/puppet@production] hieradata: use_etcd_known_client_ident pilot on cp2041 (#2)

https://gerrit.wikimedia.org/r/1202208

Change #1202272 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] hieradata: end use_etcd_known_client_ident pilot on cp2041 (#2)

https://gerrit.wikimedia.org/r/1202272

Change #1202272 merged by Scott French:

[operations/puppet@production] hieradata: end use_etcd_known_client_ident pilot on cp2041 (#2)

https://gerrit.wikimedia.org/r/1202272

Alright, pilot round #2 worked as expected: x-requestctl logging was retained, even in the identification-enabled case. Further, just by chance, we were able to observe (logged) would-be impersonation denials (i.e., a client presenting the known-client's UA from an IP not authorized to do so).

I believe we're ready to enable the identification feature globally.

Note: Although x-trusted-request "B" scored requests skip the default requestctl-filter backend in haproxy, they do not yet do so in varnish. There is some ongoing work to implement the per-score details, which needs aligned with the pending changes tracked here for identified-bot rate limits (which introduce the "B" score limits).

Change #1202306 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] hiera: enable haproxy known-client identification

https://gerrit.wikimedia.org/r/1202306

Change #1202306 merged by Scott French:

[operations/puppet@production] hiera: enable haproxy known-client identification

https://gerrit.wikimedia.org/r/1202306

Change #1205207 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:cache::varnish::frontend: render known-client rate limit VCL

https://gerrit.wikimedia.org/r/1205207

Change #1205207 abandoned by Scott French:

[operations/puppet@production] P:cache::varnish::frontend: render known-client rate limit VCL

Reason:

Test-only change

https://gerrit.wikimedia.org/r/1205207

Change #1198183 abandoned by Scott French:

[operations/puppet@production] P:cache::varnish::frontend: wire known-client rate limits into Varnish

Reason:

Superseded by I4484d65d50eeccc992d18df2a39c678ab592feec

https://gerrit.wikimedia.org/r/1198183

Change #1198182 merged by Fabfur:

[operations/puppet@production] P:cache::varnish::frontend: render known-client rate limit VCL

https://gerrit.wikimedia.org/r/1198182

Change #1211153 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] P:cache::varnish::frontend: fix confd filename normalization

https://gerrit.wikimedia.org/r/1211153

Change #1211153 merged by Scott French:

[operations/puppet@production] P:cache::varnish::frontend: fix confd filename normalization

https://gerrit.wikimedia.org/r/1211153

For later reference, https://gerrit.wikimedia.org/r/c/operations/puppet/+/1203187 is the varnishtest setup used to validate integration the confd-rendered VCL fragment for rate limiting.