Page MenuHomePhabricator

Failed to update Puppet repository /srv/git/operations/puppet on instance deployment-puppetserver-1 in project deployment-prep
Closed, ResolvedPublic

Description

Common information

  • summary: Failed to update Puppet repository /srv/git/operations/puppet on instance deployment-puppetserver-1 in project deployment-prep
  • alertname: PuppetSyncFailure
  • instance: deployment-puppetserver-1
  • job: node
  • project: deployment-prep
  • repository: /srv/git/operations/puppet
  • severity: warning

Firing alerts


  • summary: Failed to update Puppet repository /srv/git/operations/puppet on instance deployment-puppetserver-1 in project deployment-prep
  • alertname: PuppetSyncFailure
  • instance: deployment-puppetserver-1
  • job: node
  • project: deployment-prep
  • repository: /srv/git/operations/puppet
  • severity: warning
  • Source

Event Timeline

bd808 claimed this task.
bd808 triaged this task as Medium priority.
bd808 moved this task from To Triage to Puppet errors on the Beta-Cluster-Infrastructure board.
bd808 subscribed.

https://gerrit.wikimedia.org/r/c/operations/puppet/+/1143602 had a rebase conflict after https://gerrit.wikimedia.org/r/c/operations/puppet/+/1166798 was merged. I manually resolved the conflicts, uploaded https://gerrit.wikimedia.org/r/c/operations/puppet/+/1143602/4..5, dropped the old cherry-pick on deployment-puppetserver-1, and cherry-picked the new https://gerrit.wikimedia.org/r/c/operations/puppet/+/1143602/5 version.

Attempting to test on deployment-cache-text08 now shows:

Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, Class[Profile::Cache::Haproxy]: parameter 'available_certificates' entry 'unified' unrecognized key 'ocsp' (file: /srv/puppet_code/environments/production/modules/role/manifests/cache/text.pp, line: 4, column: 5) on node deployment-cache-text08.deployment-prep.eqiad1.wikimedia.cloud
bd808 reopened this task as In Progress.Jul 14 2025, 8:50 PM

Attempting to test on deployment-cache-text08 now shows:

Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, Class[Profile::Cache::Haproxy]: parameter 'available_certificates' entry 'unified' unrecognized key 'ocsp' (file: /srv/puppet_code/environments/production/modules/role/manifests/cache/text.pp, line: 4, column: 5) on node deployment-cache-text08.deployment-prep.eqiad1.wikimedia.cloud

Caused by https://gerrit.wikimedia.org/r/c/operations/puppet/+/1167695.

https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/cd03967b5c4c4e9228d1013f894a1839d0760233%5E%21/#F0

diff --git a/deployment-prep/deployment-cache.yaml b/deployment-prep/deployment-cache.yaml
index 37d0111..9b9e782 100644
--- a/deployment-prep/deployment-cache.yaml
+++ b/deployment-prep/deployment-cache.yaml

@@ -42,7 +42,6 @@
     - /etc/acmecerts/unified/live/rsa-2048.chained.crt.key
     - /etc/acmecerts/unified/live/ec-prime256v1.chained.crt.key
     critical_threshold: 15
-    ocsp: false
     server_names:
     - '*.wikimedia.beta.wmflabs.org'
     - beta.wmflabs.org
@@ -97,7 +96,6 @@
     - /etc/acmecerts/unified/live/rsa-2048.chained.crt.key
     - /etc/acmecerts/unified/live/ec-prime256v1.chained.crt.key
     critical_threshold: 15
-    ocsp: false
     server_names:
     - '*.wikimedia.beta.wmflabs.org'
     - beta.wmflabs.org
bd808@deployment-cache-text08.deployment-prep.eqiad1:~$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for deployment-cache-text08.deployment-prep.eqiad1.wikimedia.cloud
Info: Applying configuration version '(0c42c84b9a) gitpuppet - varnish: Allow customising "contact noc@" error'
Notice: /Stage[main]/Prometheus::Varnishkafka_exporter/Service[prometheus-varnishkafka-exporter]/ensure: ensure changed 'stopped' to 'running' (corrective)
Info: /Stage[main]/Prometheus::Varnishkafka_exporter/Service[prometheus-varnishkafka-exporter]: Unscheduling refresh on Service[prometheus-varnishkafka-exporter]
Warning: The directory '/etc/acmecerts/unified' contains 1013 entries, which exceeds the default soft limit 1000 and may cause excessive resource consumption and degraded performance. To remove this warning set a value for `max_files` parameter or consider using an alternate method to manage large directory trees
Notice: /Stage[main]/Profile::Cache::Varnish::Frontend/Varnish::Instance[text-frontend]/Varnish::Wikimedia_vcl[/etc/varnish/text-frontend.inc.vcl]/File[/etc/varnish/text-frontend.inc.vcl]/content:
--- /etc/varnish/text-frontend.inc.vcl  2025-07-09 06:52:13.723159848 +0000
+++ /tmp/puppet-file20250714-1305404-51fiu7     2025-07-14 21:00:43.770980277 +0000
@@ -445,65 +445,11 @@
 sub cluster_fe_hash { }

 sub cluster_fe_ratelimit {
-    // TODO: move all these rules to requestctl if possible.
-    // For now, add the requestctl header for them too,
-    // so that we have some more insight into which rule is kicking in
-    // Set the header to the empty string if not present.
+    // Add the requestctl header if it wasn't set in the previous layer
     if (!req.http.X-Requestctl) {
         set req.http.X-Requestctl = "";
     }

-    if (req.url ~ "^/api/rest_v1/page/pdf/") {
-        if (vsthrottle.is_denied("proton_limiter:" + req.http.X-Client-IP, 10, 10s)) {
-            set req.http.X-Requestctl = req.http.X-Requestctl + ",static_proton";
-            return (synth(429, "Too Many Requests"));
-        }
-    }
-
-    // Rate limit public cloud to 100/s with bursts of 1000 (excluding
-    // cloud IPs included in wikimedia_nets)
-    if (req.http.X-Provenance ~ "cloud=\w+"
-        && std.ip(req.http.X-Client-IP, "192.0.2.1") !~ wikimedia_nets
-        && vsthrottle.is_denied("public_cloud_all:" + req.http.X-Client-IP, 1000, 10s)) {
-        set req.http.X-Requestctl = req.http.X-Requestctl + ",static_public_cloud";
-        return (synth(429, "Too Many Requests. Please see https://wikitech.wikimedia.org/wiki/Beta/Blocked for more information."));
-    }
-
-    // T284479
-    if (std.ip(req.http.X-Client-IP, "192.0.2.1") ~ google_cloud_nets && req.http.User-Agent ~ "HeadlessChrome/" && req.url ~ "/w/index.php" && req.url ~ "[?&]search=") {
-        set req.http.X-Requestctl = req.http.X-Requestctl + ",static_T284479";
-        return (synth(403, "Forbidden; please see https://wikitech.wikimedia.org/wiki/Beta/Blocked for more information."));
-    }
-
-    // Ratelimit miss/pass requests per IP:
-    //   * Excluded for now:
-    //       * all WMF IPs (including labs)
-    //       * seemingly-authenticated requests (simple cookie check)
-    //   * RB and MW API, Wikidata: 1000/10s (100/s long term, with 1000 burst)
-    //   * All others from public cloud IPs: 100/10s (10/s long term, with 100 burst)
-    //   * All others: 1000/50s (20/s long term, with 1000 burst)
-    //       (current data leads us to believe sustaining 20/s should be
-    //       nearly impossible against standard MW outputs without
-    //       concurrency>1)
-    if (req.http.Cookie !~ "([sS]ession|Token)=" &&
-        std.ip(req.http.X-Client-IP, "192.0.2.1") !~ wikimedia_nets) {
-        if (req.url ~ "^/(api/rest_v1/|w/api.php|wiki/Special:EntityData)") {
-            if (vsthrottle.is_denied("rest:" + req.http.X-Client-IP, 1000, 10s)) {
-                set req.http.X-Requestctl = req.http.X-Requestctl + ",static_rest";
-                return (synth(429, "Too Many Requests"));
-            }
-        } else {
-            if (req.http.X-Provenance ~ "cloud=\w+"
-                && vsthrottle.is_denied("public_cloud_uncached:" + req.http.X-Client-IP, 100, 10s)) {
-                set req.http.X-Requestctl = req.http.X-Requestctl + ",static_public_cloud_uncached";
-                return (synth(429, "Too Many Requests"));
-            } else if (vsthrottle.is_denied("general:" + req.http.X-Client-IP, 1000, 50s)) {
-                set req.http.X-Requestctl = req.http.X-Requestctl + ",static_general";
-                return (synth(429, "Too Many Requests"));
-            }
-        }
-    }
-

 }


Notice: /Stage[main]/Profile::Cache::Varnish::Frontend/Varnish::Instance[text-frontend]/Varnish::Wikimedia_vcl[/etc/varnish/text-frontend.inc.vcl]/File[/etc/varnish/text-frontend.inc.vcl]/content: content changed '{sha256}a9830f2b1ec48f0169a0ade69dd3073799da76f980721a61cda4ac0a8f596ed7' to '{sha256}8b2d8c6cfe6704c31bc9047e1268e694285bfd4c1c0f1c1650f2fdeb1548b369'
Info: /Stage[main]/Profile::Cache::Varnish::Frontend/Varnish::Instance[text-frontend]/Varnish::Wikimedia_vcl[/etc/varnish/text-frontend.inc.vcl]/File[/etc/varnish/text-frontend.inc.vcl]: Scheduling refresh of Exec[load-new-vcl-file-frontend]
Info: Varnish::Wikimedia_vcl[/etc/varnish/text-frontend.inc.vcl]: Scheduling refresh of Exec[load-new-vcl-file-frontend]
Notice: /Stage[main]/Profile::Cache::Varnish::Frontend/Varnish::Instance[text-frontend]/Varnish::Wikimedia_vcl[/usr/share/varnish/tests/text-frontend.inc.vcl]/File[/usr/share/varnish/tests/text-frontend.inc.vcl]/content:
--- /usr/share/varnish/tests/text-frontend.inc.vcl      2025-07-09 06:52:13.819159920 +0000
+++ /tmp/puppet-file20250714-1305404-167o0fn    2025-07-14 21:00:43.826980308 +0000
@@ -455,65 +455,11 @@
 sub cluster_fe_hash { }

 sub cluster_fe_ratelimit {
-    // TODO: move all these rules to requestctl if possible.
-    // For now, add the requestctl header for them too,
-    // so that we have some more insight into which rule is kicking in
-    // Set the header to the empty string if not present.
+    // Add the requestctl header if it wasn't set in the previous layer
     if (!req.http.X-Requestctl) {
         set req.http.X-Requestctl = "";
     }

-    if (req.url ~ "^/api/rest_v1/page/pdf/") {
-        if (vsthrottle.is_denied("proton_limiter:" + req.http.X-Client-IP, 10, 10s)) {
-            set req.http.X-Requestctl = req.http.X-Requestctl + ",static_proton";
-            return (synth(429, "Too Many Requests"));
-        }
-    }
-
-    // Rate limit public cloud to 100/s with bursts of 1000 (excluding
-    // cloud IPs included in wikimedia_nets)
-    if (req.http.X-Provenance ~ "cloud=\w+"
-        && std.ip(req.http.X-Client-IP, "192.0.2.1") !~ wikimedia_nets
-        && vsthrottle.is_denied("public_cloud_all:" + req.http.X-Client-IP, 1000, 10s)) {
-        set req.http.X-Requestctl = req.http.X-Requestctl + ",static_public_cloud";
-        return (synth(429, "Too Many Requests. Please see https://wikitech.wikimedia.org/wiki/Beta/Blocked for more information."));
-    }
-
-    // T284479
-    if (std.ip(req.http.X-Client-IP, "192.0.2.1") ~ google_cloud_nets && req.http.User-Agent ~ "HeadlessChrome/" && req.url ~ "/w/index.php" && req.url ~ "[?&]search=") {
-        set req.http.X-Requestctl = req.http.X-Requestctl + ",static_T284479";
-        return (synth(403, "Forbidden; please see https://wikitech.wikimedia.org/wiki/Beta/Blocked for more information."));
-    }
-
-    // Ratelimit miss/pass requests per IP:
-    //   * Excluded for now:
-    //       * all WMF IPs (including labs)
-    //       * seemingly-authenticated requests (simple cookie check)
-    //   * RB and MW API, Wikidata: 1000/10s (100/s long term, with 1000 burst)
-    //   * All others from public cloud IPs: 100/10s (10/s long term, with 100 burst)
-    //   * All others: 1000/50s (20/s long term, with 1000 burst)
-    //       (current data leads us to believe sustaining 20/s should be
-    //       nearly impossible against standard MW outputs without
-    //       concurrency>1)
-    if (req.http.Cookie !~ "([sS]ession|Token)=" &&
-        std.ip(req.http.X-Client-IP, "192.0.2.1") !~ wikimedia_nets) {
-        if (req.url ~ "^/(api/rest_v1/|w/api.php|wiki/Special:EntityData)") {
-            if (vsthrottle.is_denied("rest:" + req.http.X-Client-IP, 1000, 10s)) {
-                set req.http.X-Requestctl = req.http.X-Requestctl + ",static_rest";
-                return (synth(429, "Too Many Requests"));
-            }
-        } else {
-            if (req.http.X-Provenance ~ "cloud=\w+"
-                && vsthrottle.is_denied("public_cloud_uncached:" + req.http.X-Client-IP, 100, 10s)) {
-                set req.http.X-Requestctl = req.http.X-Requestctl + ",static_public_cloud_uncached";
-                return (synth(429, "Too Many Requests"));
-            } else if (vsthrottle.is_denied("general:" + req.http.X-Client-IP, 1000, 50s)) {
-                set req.http.X-Requestctl = req.http.X-Requestctl + ",static_general";
-                return (synth(429, "Too Many Requests"));
-            }
-        }
-    }
-

     if (req.http.User-Agent ~ "^varnishTest") {
         if (vsthrottle.is_denied("varnishTest:" + req.http.X-Client-IP, 25, 5s)) {

Notice: /Stage[main]/Profile::Cache::Varnish::Frontend/Varnish::Instance[text-frontend]/Varnish::Wikimedia_vcl[/usr/share/varnish/tests/text-frontend.inc.vcl]/File[/usr/share/varnish/tests/text-frontend.inc.vcl]/content: content changed '{sha256}e7c1db4b6b9869f674c33cdc9ca501801f1e39369a2f87fef02868939cb79583' to '{sha256}fd2362bce193b882d6d945bdb4d8943a8c30555a6a406696074c4c1d0b87b65c'
Notice: /Stage[main]/Profile::Cache::Varnish::Frontend/Varnish::Instance[text-frontend]/Exec[load-new-vcl-file-frontend]: Triggered 'refresh' from 2 events
Notice: Applied catalog in 23.57 seconds