Page MenuHomePhabricator

get nth element of a list fails too much
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:
The test fails with all implementations, it is unclear why

What should have happened instead?:
It should pass or get fixed.

It could be that the test or the implementations are faulty. It is unclear if the error is in the data or the system. Investigation is needed. It was reported here: https://www.wikifunctions.org/wiki/Wikifunctions:Project_chat#Internal_server_error_in_running_a_composition by Aaron Liu and investigated a bit by Dv103

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Event Timeline

This is happening for a couple of reasons.

First, there is a timeout due to the sizes of types. This is being addressed separately (https://gitlab.wikimedia.org/repos/abstract-wiki/wikifunctions/function-schemata/-/merge_requests/228).

Second, there is a type conversion issue. Because the "get nth element of a list" function is declared with a return type of Z1, it can't be determined whether the result has a type-converter. Because Gregorian months are type-converted to integers, the function returns an integer, and then the executor explodes because default type-conversion can't handle integers. This part can't really be fixed. We'd need a type-specific "nth element of list" function (or a generic function to return an "nth element of list" function, parameterized by return type).

For your second point, just before I start work to work around this issue, can you comment on how it works in the built in function https://www.wikifunctions.org/view/en/Z811?

P.S. I'm also not sure why your example is Gregorian months. The untyped month list https://www.wikifunctions.org/view/en/Z16249 is the only test that passes!

The reason it works in the builtin function is also why the test you mention passes. In both cases, the implementation uses compositions instead of evaluated code. Compositions do not use type conversion, so this issue does not arise.

To explain further: the root issue here is that 1) a function defines its output type as Z1 and 2) it operates over types that define type converters. Because of 2), the inputs (in this case, Gregorian months) get converted to Python ints. Because of 1), there's no way to determine what the type converter back to ZObject language should be (we can only infer that when we know the output type). Therefore, the function ends up returning an int, but the system has no way of knowing what to do with that int. Because an int is an inadmissible object, the executor complains that it's unable to type-convert it.

The only fixes here are as follows: EITHER to create a type-specific list access function (in this case, one that returns a Gregorian month) OR to rely only on compositions for this particular use case.

Thanks, that helps. I've already been building a recursive composition here.

Are you saying that the built-in for Z811 is a composition?? Which functions is it composing? (I can't see the contents of built-ins, but it's a surprise to me that any of them are compositions...)

Ah, sorry, Z811 is not a composition. I was trying to draw a distinction between functions that use evaluated code and those that don't. I should have said, "builtins OR compositions consisting only of builtins and compositions thereof."

Also, for the ambitious, there is a third option. The "get nth element of list" function could instead be a function that takes a type as input and returns a function. If the type is X, the returned function would accept Z881(X), Natural number as input and return an X. Last time I tried something like this, it didn't work, but that was about two years ago, so maybe functions-that-return-functions are possible now.

Change #1151782 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/extensions/WikiLambda@master] Update function-schemata sub-module to HEAD (e27d97f)

https://gerrit.wikimedia.org/r/1151782

@cmassaro Seems to be fixed, could you please close it?

I guess it's still a disappointing outcome, because the code implementations were working before, and we've had to disable them and switch to the much slower recursive composition. Many of the fails are this new "unspecified error" which is now popping up everywhere, and so I was still hopeful that when something gets fixed, it could handle these too.

I guess it's still a disappointing outcome, because the code implementations were working before, and we've had to disable them and switch to the much slower recursive composition. Many of the fails are this new "unspecified error" which is now popping up everywhere, and so I was still hopeful that when something gets fixed, it could handle these too.

Yeah :-/. This is a casualty of how we're handling type conversion, which relies on precise type specification for outputs. I think that the situation with type conversion will improve over the next year, but I don't see any outcome where Z1 as an output type will continue to yield good results.

Would it be helpful if it were easier to parameterize functions like "get nth element of a list" by output type? I welcome you to open another Phabricator task so we can discuss options.

@cmassaro Seems to be fixed, could you please close it?

Done.

I guess it's still a disappointing outcome, because the code implementations were working before, and we've had to disable them and switch to the much slower recursive composition. Many of the fails are this new "unspecified error" which is now popping up everywhere, and so I was still hopeful that when something gets fixed, it could handle these too.

Yeah :-/. This is a casualty of how we're handling type conversion, which relies on precise type specification for outputs. I think that the situation with type conversion will improve over the next year, but I don't see any outcome where Z1 as an output type will continue to yield good results.

Would it be helpful if it were easier to parameterize functions like "get nth element of a list" by output type? I welcome you to open another Phabricator task so we can discuss options.

Z1 as an output type is the standard for most/all of the built-in list functions, and we've built out all the interesting stuff from there.

Can you show me what you mean by "parameterizing functions by output type"? Do you mean making a version of this function for every different type we have? That would seem mindnumbing to create if all the code implementations were actually the same...

I'll have a think about another phab request, but I don't really know what to ask for at this stage... it all just seems so messy!

Change #1151782 merged by jenkins-bot:

[mediawiki/extensions/WikiLambda@master] Update function-schemata sub-module to HEAD (4680f99)

https://gerrit.wikimedia.org/r/1151782

Change #1153616 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/deployment-charts@master] wikifunctions: Update evaluators from 2025-05-21-192515 to 2025-06-03-205630

https://gerrit.wikimedia.org/r/1153616

Change #1153617 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/deployment-charts@master] wikifunctions: Update orchestrator from 2025-05-21-192453 to 2025-06-03-231524

https://gerrit.wikimedia.org/r/1153617

Change #1153616 merged by jenkins-bot:

[operations/deployment-charts@master] wikifunctions: Update evaluators from 2025-05-21-192515 to 2025-06-03-205630

https://gerrit.wikimedia.org/r/1153616

I'm not someone who has any insight for the Wikilambda code, but perhaps this is more reason to allow Wikilambda to natively handle storing integers (perhap 32-bit ones) instead of having to rely on strings?

I'm not someone who has any insight for the Wikilambda code, but perhaps this is more reason to allow Wikilambda to natively handle storing integers (perhap 32-bit ones) instead of having to rely on strings?

I can't speak to the benefits of native integers (I think this would be hard to do). However, I will note that this problem doesn't only afflict numeric types. This failure mode will affect every type that wants to use type conversion, so it's important to address the root cause.

Change #1153617 merged by jenkins-bot:

[operations/deployment-charts@master] wikifunctions: Update orchestrator from 2025-05-21-192453 to 2025-06-04-185118

https://gerrit.wikimedia.org/r/1153617

[snip…] This failure mode will affect every type that wants to use type conversion, so it's important to address the root cause.

Update:

We now have a function that determines whether custom-conversion might apply (Type has custom converters from code (Z28688)). Based on this, untype list if custom converters (Z28691) is implemented to return a Z1-typed list where custom conversion from code might apply. Currently, this is used in the preferred implementation of Z13397, which is able to retrieve the 1651st statement in the fetched Wikidata item for the United States (currently the last statement).

There are currently 28 implementations of other functions and 6 test cases for other functions that rely on this function, so it would be great if we could avoid accidentally breaking it. I suggest that last statement in Italy (Z28710) would be an interesting test case to keep your eye on.

@99of9 As far as I can tell, this function is now safe to use in any context. The performance will be a bit better if you call element by index (helper) directly, in a context where no custom conversion applies. Is this still too “messy”? Like you, I’m not sure what Cory meant by “parameterize functions … by output type”, but it would seem convenient to be able to pass a type argument into a Z1-typed function and let it be assumed that conversion from code would use that type’s converter (if any) on the result or guarantee that only an object of the given type could be returned. For a properly typed list, that would be the list’s type, but where Z1 conceals union types (like a Wikidata statement’s subject) such an approach might ensure that the returned object has one of the types specified in the input argument.

By "paramaterize functions ... by output type," I was thinking of an old proposal to make Z8/Function into a generic type, where the argument types and return types would be parameters to the function generating the type. This would be a huge change, but it would resolve a lot of problems like this one.

Is this still too “messy”?

I won't know until it has fully sunk in, but it seems like a very important improvement. Certainly getting to element 1651 is far more than I could previously hope for.

By "paramaterize functions ... by output type,"

Thanks, Cory. I promise not to lose any sleep over T322052 any time soon!