Page MenuHomePhabricator

An outdated version of an amended implementation may be used for several hours
Open, MediumPublicBUG REPORT

Assigned To
Authored By
GrounderUK
Oct 8 2025, 9:01 PM
Referenced Files
F69556335: IMG_1395.png
Nov 2 2025, 6:27 PM
F66765692: IMG_1390.png
Oct 20 2025, 8:49 AM
F66765669: IMG_1389.png
Oct 20 2025, 8:49 AM
F66740171: image.png
Oct 9 2025, 11:01 AM
F66739294: IMG_1382.png
Oct 8 2025, 9:01 PM
F66739293: IMG_1381.png
Oct 8 2025, 9:01 PM

Description

Steps to replicate the issue (include links if applicable):

What happens?:
Currently, all tests fail. All have this error:
Error type: Error in evaluation
Stack trace:
Unspecified error
Validation error type: Key not found

What should have happened instead?:
All tests should pass. All tests have previously passed. Nothing has changed.

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

The implementation originally timed out in evaluation when there was only one test case. After Publish, the error became inconsistent use of tabs and spaces. Fixing this resolved the timeout and the test passed. Adding Z28720, the test would not pass but worked in Try this function (11:43)

Noted on Telegram “I’m getting inconsistent evaluations of tests for Z28715. All tests have passed, but there have been failures when creating a test case when the same call succeeds in Try this. Then the test cases passed after Publish and now they have failed again 🤷‍♂️ No time to file a ticket right now.”

Subsequently, all tests passed again for this implementation but two started failing for Z28718. These still show as failed but pass in edit.

IMG_1381.png (2×960 px, 168 KB)
All tests failing

IMG_1382.png (2×960 px, 161 KB)
All tests passing (during edit)

Event Timeline

I've got problems with other Python implementations too. E.g. try to run https://www.wikifunctions.org/view/en/Z19943 with the input 0.1234

image.png (600×542 px, 27 KB)

I nudged a similar one with issues, and now all tests fail: https://www.wikifunctions.org/view/en/Z19933

I've raised a different ticket, T406848, for the wider Python issues in case they're different.

Jdforrester-WMF renamed this task from Erratic test results on “index of first sub-list (start)” Z28715 to Test results on “index of first sub-list (start)” Z28715 always fail on saved version, but pass in editing mode.Oct 15 2025, 5:38 PM
Jdforrester-WMF triaged this task as Medium priority.
Jdforrester-WMF added a project: WikiLambda.

Similar issue with simplified Type object from Type (Z28945). I made a material change to its (then) single implementation (Z28947) to return a Quote object, timed at 19:58. When trying to add a test case the following morning, the actual result is not a Quote object:

IMG_1389.png (2×960 px, 175 KB)

An equivalent call from “Try this…” does return a Quote object:

IMG_1390.png (2×960 px, 119 KB)

This is not a cached function call. What is happening is that a previous version of the code is being picked up and used for a new function call, but only when executed in a test case in edit mode. I believe this accounts for the previous erratic behaviour: Z28720 was being evaluated using a previous version of the code, until it was published, when it used the current version. The subsequent failures were probably just the Python outage.

Quite apart from the frustrating inability to create new test cases reliably, it may well be that Z28947 didn’t need to change in the first place. It was performing as expected when using “Try this…” in edit mode and I assumed it was a de-referencing issue when I started getting a full type rather than a reference for new test cases. This seems to be confirmed by new implementation Z28967 which (so far) is behaving as expected.

GrounderUK renamed this task from Test results on “index of first sub-list (start)” Z28715 always fail on saved version, but pass in editing mode to When editing a test case, an outdated version of the code for amended implementations may be used for several hours.EditedNov 2 2025, 1:44 PM

Similar issue with (Integer, Rational number), quoted (Z29165): correct version of the code is used everywhere except when editing the test case.

At 13:50 (code was changed to expect
Z29128K1 at 12:46)…

Error type: Error in evaluation
Error data:
function call: "Z29128() got an unexpected keyword argument 'Z29128K1'"
Validation error type: Argument value error
Expected result: ["Z99",{"Z1K1":"Z99","Z99K1":"Z16683"},{"Z1K1":"Z99","Z99K1":"Z19677"}]
Actual result: Z24

Error persists at 18:23

IMG_1395.png (2×960 px, 191 KB)

I had a slightly different manifestation recently. this was failing in edit mode. It passed as soon as it was published but went back to failing again after visiting the function page, where it showed as passing… it may be that the 22:44 edit on the test case caused the reversion.

Any chance of an update on this one, please? Is the time taken for the system to fix itself predictable?

It looks like this is no longer limited to test-case editing. I changed Z32879 at 15:15 yesterday and the previous version of the implementation is still being used for new function calls and only works correctly when editing. This is the reverse of the original problem and much more disruptive.

GrounderUK renamed this task from When editing a test case, an outdated version of the code for amended implementations may be used for several hours to An outdated version an amended implementation may be used for several hours.Mar 30 2026, 7:56 AM
GrounderUK renamed this task from An outdated version an amended implementation may be used for several hours to An outdated version of an amended implementation may be used for several hours.

Good luck with this one @gengh please don't hesitate to ask the community if we can help in any way 🙌