Page MenuHomePhabricator

wide-scale Python failure
Open, HighPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:

image.png (639×1 px, 51 KB)

What should have happened instead?:
It should work. This is an incredibly simple python implementation. We have trouble with Python everywhere at the moment. This is just one example.

Related Objects

Event Timeline

I've deployed a new version of the evaluators, and this is now fixed.

In running the test command before deploying the above, it indeed failed:

{"Z1K1":"Z22","Z22K1":"Z24","Z22K2":{"Z1K1":{"Z1K1":"Z7","Z7K1":"Z883","Z883K1":"Z6","Z883K2":"Z1"},"K1":[{"Z1K1":"Z7","Z7K1":"Z882","Z882K1":"Z6","Z882K2":"Z1"},{"Z1K1":{"Z1K1":"Z7","Z7K1":"Z882","Z882K1":"Z6","Z882K2":"Z1"},"K1":"errors","K2":{"Z1K1":"Z5","Z5K1":"Z507","Z5K2":{"Z1K1":{"Z1K1":"Z7","Z7K1":"Z885","Z885K1":"Z507"},"Z507K1":{"Z1K1":"Z99","Z99K1":{"Z1K1":{"Z1K1":"Z9","Z9K1":"Z7"},"Z7K1":{"Z1K1":{"Z1K1":"Z9","Z9K1":"Z8"},"Z8K1":{"Z1K1":{"Z1K1":{"Z1K1":"Z9","Z9K1":"Z7"},"Z7K1":{"Z1K1":"Z9","Z9K1":"Z881"},"Z881K1":{"Z1K1":"Z9","Z9K1":"Z17"}},"K1":{"Z1K1":{"Z1K1":"Z9","Z9K1":"Z17"},"Z17K1":{"Z1K1":"Z9","Z9K1":"Z6"},"Z17K2":{"Z1K1":"Z6","Z6K1":"Z400K1"},"Z17K3":{"Z1K1":{"Z1K1":"Z9","Z9K1":"Z12"},"Z12K1":{"Z1K1":{"Z1K1":{"Z1K1":"Z9","Z9K1":"Z7"},"Z7K1":{"Z1K1":"Z9","Z9K1":"Z881"},"Z881K1":{"Z1K1":"Z9","Z9K1":"Z11"}}}}},"K2":{"Z1K1":{"Z1K1":{"Z1K1":"Z9","Z9K1":"Z7"},"Z7K1":{"Z1K1":"Z9","Z9K1":"Z881"},"Z881K1":{"Z1K1":"Z9","Z9K1":"Z17"}},"K1":{"Z1K1":{"Z1K1":"Z9","Z9K1":"Z17"},"Z17K1":{"Z1K1":"Z9","Z9K1":"Z6"},"Z17K2":{"Z1K1":"Z6","Z6K1":"Z400K2"},"Z17K3":{"Z1K1":{"Z1K1":"Z9","Z9K1":"Z12"},"Z12K1":{"Z1K1":{"Z1K1":{"Z1K1":"Z9","Z9K1":"Z7"},"Z7K1":{"Z1K1":"Z9","Z9K1":"Z881"},"Z881K1":{"Z1K1":"Z9","Z9K1":"Z11"}}}}},"K2":{"Z1K1":{"Z1K1":{"Z1K1":"Z9","Z9K1":"Z7"},"Z7K1":{"Z1K1":"Z9","Z9K1":"Z881"},"Z881K1":{"Z1K1":"Z9","Z9K1":"Z17"}}}}},"Z8K2":{"Z1K1":"Z9","Z9K1":"Z1"},"Z8K3":{"Z1K1":{"Z1K1":{"Z1K1":"Z9","Z9K1":"Z7"},"Z7K1":{"Z1K1":"Z9","Z9K1":"Z881"},"Z881K1":{"Z1K1":"Z9","Z9K1":"Z20"}}},"Z8K4":{"Z1K1":{"Z1K1":{"Z1K1":"Z9","Z9K1":"Z7"},"Z7K1":{"Z1K1":"Z9","Z9K1":"Z881"},"Z881K1":{"Z1K1":"Z9","Z9K1":"Z14"}},"K1":{"Z1K1":{"Z1K1":"Z9","Z9K1":"Z14"},"Z14K1":{"Z1K1":"Z9","Z9K1":"Z400"},"Z14K3":{"Z1K1":{"Z1K1":"Z9","Z9K1":"Z16"},"Z16K1":{"Z1K1":{"Z1K1":"Z9","Z9K1":"Z61"},"Z61K1":{"Z1K1":"Z6","Z6K1":"python"}},"Z16K2":{"Z1K1":"Z6","Z6K1":"def Z400(Z400K1,Z400K2):\n\treturn str(int(Z400K1) + int(Z400K2))"}}},"K2":{"Z1K1":{"Z1K1":{"Z1K1":"Z9","Z9K1":"Z7"},"Z7K1":{"Z1K1":"Z9","Z9K1":"Z881"},"Z881K1":{"Z1K1":"Z9","Z9K1":"Z14"}}}},"Z8K5":{"Z1K1":"Z9","Z9K1":"Z400"}},"Z400K1":{"Z1K1":"Z6","Z6K1":"5"},"Z400K2":{"Z1K1":"Z6","Z6K1":"8"}}},"Z507K2":{"Z1K1":"Z5","Z5K1":"Z500","Z5K2":{"Z1K1":{"Z1K1":"Z7","Z7K1":"Z885","Z885K1":"Z500"},"Z500K1":"Function evaluation failed with status 504: {\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z22\"},\"Z22K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z24\"},\"Z22K2\":{\"Z1K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z883\"},\"Z883K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z6\"},\"Z883K2\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z1\"}},\"K1\":{\"Z1K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z881\"},\"Z881K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z882\"},\"Z882K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z6\"},\"Z882K2\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z1\"}}},\"K1\":{\"Z1K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z882\"},\"Z882K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z6\"},\"Z882K2\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z1\"}},\"K1\":{\"Z1K1\":\"Z6\",\"Z6K1\":\"errors\"},\"K2\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z5\"},\"Z5K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z575\"},\"Z5K2\":{\"Z1K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z885\"},\"Z885K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z575\"}},\"Z575K1\":{\"Z1K1\":\"Z6\",\"Z6K1\":\"9000 ms\"}}}},\"K2\":{\"Z1K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z881\"},\"Z881K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z882\"},\"Z882K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z6\"},\"Z882K2\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z1\"}}},\"K1\":{\"Z1K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z882\"},\"Z882K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z6\"},\"Z882K2\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z1\"}},\"K1\":{\"Z1K1\":\"Z6\",\"Z6K1\":\"evaluationMemoryUsage\"},\"K2\":{\"Z1K1\":\"Z6\",\"Z6K1\":\"123.33 MiB\"}},\"K2\":{\"Z1K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z881\"},\"Z881K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z882\"},\"Z882K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z6\"},\"Z882K2\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z1\"}}},\"K1\":{\"Z1K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z882\"},\"Z882K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z6\"},\"Z882K2\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z1\"}},\"K1\":{\"Z1K1\":\"Z6\",\"Z6K1\":\"evaluationCpuUsage\"},\"K2\":{\"Z1K1\":\"Z6\",\"Z6K1\":\"14.568 ms\"}},\"K2\":{\"Z1K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z881\"},\"Z881K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z882\"},\"Z882K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z6\"},\"Z882K2\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z1\"}}},\"K1\":{\"Z1K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z882\"},\"Z882K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z6\"},\"Z882K2\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z1\"}},\"K1\":{\"Z1K1\":\"Z6\",\"Z6K1\":\"evaluationStartTime\"},\"K2\":{\"Z1K1\":\"Z6\",\"Z6K1\":\"2025-10-09T13:30:46.487Z\"}},\"K2\":{\"Z1K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z881\"},\"Z881K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z882\"},\"Z882K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z6\"},\"Z882K2\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z1\"}}},\"K1\":{\"Z1K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z882\"},\"Z882K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z6\"},\"Z882K2\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z1\"}},\"K1\":{\"Z1K1\":\"Z6\",\"Z6K1\":\"evaluationEndTime\"},\"K2\":{\"Z1K1\":\"Z6\",\"Z6K1\":\"2025-10-09T13:30:55.487Z\"}},\"K2\":{\"Z1K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z881\"},\"Z881K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z882\"},\"Z882K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z6\"},\"Z882K2\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z1\"}}},\"K1\":{\"Z1K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z882\"},\"Z882K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z6\"},\"Z882K2\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z1\"}},\"K1\":{\"Z1K1\":\"Z6\",\"Z6K1\":\"evaluationDuration\"},\"K2\":{\"Z1K1\":\"Z6\",\"Z6K1\":\"9000 ms\"}},\"K2\":{\"Z1K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z881\"},\"Z881K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z882\"},\"Z882K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z6\"},\"Z882K2\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z1\"}}},\"K1\":{\"Z1K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z882\"},\"Z882K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z6\"},\"Z882K2\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z1\"}},\"K1\":{\"Z1K1\":\"Z6\",\"Z6K1\":\"evaluationHostname\"},\"K2\":{\"Z1K1\":\"Z6\",\"Z6K1\":\"function-evaluator-python-evaluator-f9fb846bc-tb8cw\"}},\"K2\":{\"Z1K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z881\"},\"Z881K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z7\"},\"Z7K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z882\"},\"Z882K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z6\"},\"Z882K2\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z1\"}}}}}}}}}}}}}"}}}}},{"Z1K1":{"Z1K1":"Z7","Z7K1":"Z882","Z882K1":"Z6","Z882K2":"Z1"},"K1":"orchestrationMemoryUsage","K2":"162.86 MiB"},{"Z1K1":{"Z1K1":"Z7","Z7K1":"Z882","Z882K1":"Z6","Z882K2":"Z1"},"K1":"orchestrationCpuUsage","K2":"56.219 ms"},{"Z1K1":{"Z1K1":"Z7","Z7K1":"Z882","Z882K1":"Z6","Z882K2":"Z1"},"K1":"orchestrationStartTime","K2":"2025-10-09T13:30:46.470Z"},{"Z1K1":{"Z1K1":"Z7","Z7K1":"Z882","Z882K1":"Z6","Z882K2":"Z1"},"K1":"orchestrationEndTime","K2":"2025-10-09T13:30:55.494Z"},{"Z1K1":{"Z1K1":"Z7","Z7K1":"Z882","Z882K1":"Z6","Z882K2":"Z1"},"K1":"orchestrationDuration","K2":"9024 ms"},{"Z1K1":{"Z1K1":"Z7","Z7K1":"Z882","Z882K1":"Z6","Z882K2":"Z1"},"K1":"orchestrationHostname","K2":"function-orchestrator-main-orchestrator-66bd767b7-r2hsm"}]}}

… or in short form, a Z507 (failed evaluation) with a Z575 (timeout), but no further details.

When running from the deployment server, I got "Orchestration generally failed," while I did indeed get a timeout when running from the UI. Very weird.

cmassaro subscribed.

Remaining work:

  • Document work-arounds
  • Determine the cause
  • Fix the issue

When running from the deployment server, I got "Orchestration generally failed," while I did indeed get a timeout when running from the UI. Very weird.

This is happening again. Do we need a new ticket?

When running from the deployment server, I got "Orchestration generally failed," while I did indeed get a timeout when running from the UI. Very weird.

This is happening again. Do we need a new ticket?

No, we can work from this one.

We still need to do all the things James mentioned: determine the cause, fix the issue, etc. So this bug should remain open as a tracking bug until we've done all of that.

As for the current outage, I've scheduled a deployment for later in the day (00:00 UTC). That means we will hopefully be back up in three hours.

Python appears to be up and running again.

Change #1202148 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/deployment-charts@master] wikifunctions: Upgrade evaluators from 2025-10-28-150053 to 2025-11-05-063501

https://gerrit.wikimedia.org/r/1202148

Change #1202148 merged by jenkins-bot:

[operations/deployment-charts@master] wikifunctions: Upgrade evaluators from 2025-10-28-150053 to 2025-11-05-063501

https://gerrit.wikimedia.org/r/1202148

Change #1204603 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/deployment-charts@master] [WIP] Add Python test call back in

https://gerrit.wikimedia.org/r/1204603

Change #1204603 merged by jenkins-bot:

[operations/deployment-charts@master] wikifunctions: Add Python test call back in to test script

https://gerrit.wikimedia.org/r/1204603

Recurrence reported on Telegram at 09:12 UTC on 2025-11-30.

Although an emergency fix has been deployed with some success (thanks), we are still getting random, intermittent timeouts like this one.

This succeeded immediately after this failed.

Please also see integers have the same sign, python (Z17272), where the first and third tests in the list failed and other tests passed, at the more or less the same time.

Change #1217219 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/deployment-charts@master] wikifunctions: Upgrade evaluators from 2025-12-08-185405 to 2025-12-10-150641

https://gerrit.wikimedia.org/r/1217219

Change #1217219 merged by jenkins-bot:

[operations/deployment-charts@master] wikifunctions: Upgrade evaluators from 2025-12-08-185405 to 2025-12-10-150641

https://gerrit.wikimedia.org/r/1217219