Hi, unsure which exact tag to add, so tagging with Abstract Wikipedia team, hoping it will be good enough as a starting tag.
Today at 13:10:21 UTC we had a paging incident for wikifunctions.org (the actual wiki, not the orchestrator/evaluator).
FIRING: ProbeDown: Service mw-wikifunctions:4451 has failed probes (http_mw-wikifunctions_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mw-wikifunctions:4451 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
Looking at grafana this only manifested in our Dallas DC, a pretty good sign this is related to some specific regional traffic. This is also corroborated by PHP FPM workers for this specific wiki that start at ~11:00 UTC to be way more utilized, at times up to 100%
Traffic hasn't increased correspondingly, it's still at the [same levels as before the event depicted above]
(https://grafana.wikimedia.org/d/35WSHOjVk/application-servers-red-k8s?orgId=1&from=1725605323479&to=1725630378539&var-site=codfw&var-deployment=mw-wikifunctions&var-method=GET&var-code=200&var-handler=php&var-service=mediawiki), namely max 6 rps
Latencies and error rates has also increased considerably.
The amount of traffic making it to the service (max 6 rps) and the still in development nature of wikifunctions doesn't makes it unappealing to yield any ban hammer against whatever traffic is causing this.
Apache logs have the following:
{"timestamp": "2024-09-06T12:55:08", "RequestTime": "101", "Client-IP": "127.0.0.1", "Handle/Status": "-/414", "ResponseSize": "248", "Method": "-", "Url": "http://--", "MimeType": "text/html", "Referer": "-", "X-Forwarded-For": "-", "User-Agent": "-", "Accept-Language": "-", "X-Analytics": "-", "User": "-", "UserHeader": "-", "Connect-IP": "127.0.0.1", "X-Request-Id": "-", "X-Client-IP": "-"}but not much more.
php fpm logs are more telling (I think)
[06-Sep-2024 13:57:02] WARNING: [pool www] child 255443, script '/srv/mediawiki/docroot/wikifunctions.org/w/api.php' (request: "GET /w/api.php?action=wikilambda_perform_test&format=json&uselang=en&wikilambda_perform_test_zfunction=Z10132&wikilambda_perform_test_zimplementations=Z10133&wikilambda_perform_test_ztesters=Z10134") execution timed out (202.138660 sec), terminating
HTTP 414 stands for URI Too Long


