Page MenuHomePhabricator

Consider using list inputs to create variadic apply, filter, map, etc. functions
Open, MediumPublic

Description

Description

This has been requested a fair bit by the community (see child tasks). There was a recent request on IRC for a multi-argument version of Filter, so I'd like to reopen this discussion.

We could operationalize this as follows:

  • the inputs would be accepted as lists (rather than single arguments);
  • the list inputs would be mapped in order to K1, K2, etc. in the generated Z7

Desired behavior/Acceptance criteria (returned value, expected error, performance expectations, etc.)

  • make functions that accept functions more flexible (in terms of argument structure)

Completion checklist

Event Timeline

Passing arguments to a filter function would mean we could avoid creating functions like Z23605 and allow such a function (if created) to be implemented without the creation of additional functions like Z22490 or (different context) Z23147.

Would these be additional functions and, if so, would we deprecate the existing ones?

Noting T370028, would the list’s type be inferred from the return type for the function argument or derived (“heuristically”) from the results (which wouldn’t work for an empty list). A reasonable approach might be to use the return type for the function argument when it is specific (not Z1, Z1-list etc). Alternatively, the type of list required/expected could be specified as an optional additional argument, defaulting to the function argument’s return type.

We should also consider whether the function argument could ultimately support a composition. In any event, we could explore specifying the proposed list of arguments (in the UI) as if a composition were being constructed.

They should be different functions to the existing ones, and the decision about deprecation should be deferred until we have the new ones in place. That way nothing breaks and we can test everything sequentially.

Jdforrester-WMF subscribed.

Over to Product to decide if we're doing this.

Product review:

  • Sure, but not a high priority.

Who can we talk to about the prioritisation here?

I just came across yet another roadblock where I really wanted this. Directly aligned with the last few quarterly plans where we try to make stuff for Wiktionaries.

In this week's volunteers' corner, the composition https://www.wikifunctions.org/view/en/Z29187 needs a table put in for "table's rows". Wouldn't it be simpler if that table could be generated by a function? The function would call a function for each of its elements: present_tense(lemma, grammatical_person, grammatical_number). Note that we need to apply a function with 3 parameters, or maybe more if this was generalised by language etc.

I made an example of a table generating function at https://www.wikifunctions.org/view/en/Z29286, but again, it's limited to 2 input dimensions.

I think this is either the 3rd or 4th volunteers' corner where it would have directly helped.

This should be very quick to implement. We'd need to

  1. Allocate ZIDs for these functions and
  2. Write the built-in implementations.

For the built-in functions, we'd have to think about how to name the arguments. Let's say that we have

Map( Function: Z8, ArgumentLists: Z881( Z1 ) )

Let's further say that we call this with a Function whose identity is Z400.

For each argument list, we'd create a Z7 like

{
    Z1K1: 'Z7',
    Z7K1: <Function>,
    K1: <argument list[0]>,
    K2: <argument list[1]>,
    ...
    KN: <argument list[n-1]>
}

OR

{
    Z1K1: 'Z7',
    Z7K1: <Function>,
    Z400K1: <argument list[0]>,
    Z400K2: <argument list[1]>,
    ...
    Z400KN: <argument list[n-1]>
}

The former option (local keys) should work in all cases, but it would make argument references more difficult to deal with (since all local keys are named in the same way).

For the latter option, we could try to infer the function identity from Function's definition, but that would fail in some cases (e.g., Z8s defined in-line).
OR we could simply have the identity be provided as an argument to Map (and the other functions) to avoid this issue.

I'm happy to put together a design document, but I want to give an opportunity here for people to express their preferences.

Admittedly, soon after I loudly complained, Al Grounder showed me that we can already manually do something like your option 1 (at least) for any finite Apply. For example, I've now got apply3 working at Z29365. So the urgency of the list version is no longer as clear. But if it's easy to implement, it will surely prove useful too.

As to your actual models, I'm afraid I can't see through the subtleties far enough to have a useful opinion at this stage.

Ah, good to know! I figured it was possible to do in the composition language, but not necessarily easy or efficient. If there's already a recipe for this that works without too much trouble, good 😄 .

That said, please do speak up again if the need becomes urgent! It wouldn't take very much time to write these builtins, but I want to proceed carefully: once we commit to a set of ZIDs and an implementation, there's no easy road back.

I'm not sure if it's actually possible to do in the regular interface. I went into manual edit mode to copy what Denny/Al had done. But in any case, that level of difficulty is worth it for this kind of rare but enabling need.

I'm not sure if it's actually possible to do in the regular interface. I went into manual edit mode to copy what Denny/Al had done. But in any case, that level of difficulty is worth it for this kind of rare but enabling need.

I think it is possible using just the UI, which is how I did Z29349. When you choose to specify a function call’s Z7K1 other than by reference, the UI enables an “add argument” option for the object specifying the Z7K1 (the call to Z29350/Function identity, in this case).

[…snip…]

I'm happy to put together a design document, but I want to give an opportunity here for people to express their preferences.

I have a preference for using global keys because they are searchable in resultant persistent objects. I’m not sure what the problem is with falling back to local keys when the function has no explicit reference. However, I noticed when creating vararg function call, Python (Z29355) that local keys were simply ignored in ZObject construction, and an object with no arguments was returned.

More generally, it should be possible for the user interface to display the appropriate label for a global key and (eventually) to require an object of the appropriate type. Currently, the “add argument” option seems to support only local keys and requires the type to be specified for each one (even for a Typed pair).

I feel a little queasy about a fallback mechanism that implicitly changes what gets put into the scope. Consider a case like this: let's say we have a function call

Apply( Z400, [ 'Z6', 'somestring' ] )

and Z400 points to some Z8, call it F.

Now let's say I call

Apply( F2, [ 'Z6', 'somestring' ] )

where F2 is a Z8 literal, functionally the same as F, except that its Z8K5 isn't a Z9 (this is allowed by the system).

The two function calls would generate Z7s like, respectively,

{ Z1K1: 'Z7', Z7K1: 'Z400', Z400K1: 'somestring' }

and

{ Z1K1: 'Z7', Z7K1: F2, K1: 'somestring' }

Now, let's say that Z400/F/F2 all contain an internal Z18 somewhere like

{ Z1K1: 'Z18', Z18K1: 'K1' }

In that case, the two ostensibly identical calls above would have different behavior, depending on the value of K1 that was (or was not) captured from some enclosing scope.

To be fair, this is a corner case, so I'm not strenuously against the idea.

I feel a little queasy about a fallback mechanism that implicitly changes what gets put into the scope. Consider a case like this: let's say we have a function call

Apply( Z400, [ 'Z6', 'somestring' ] )

and Z400 points to some Z8, call it F.

Now let's say I call

Apply( F2, [ 'Z6', 'somestring' ] )

where F2 is a Z8 literal, functionally the same as F, except that its Z8K5 isn't a Z9 (this is allowed by the system).

The two function calls would generate Z7s like, respectively,

{ Z1K1: 'Z7', Z7K1: 'Z400', Z400K1: 'somestring' }

and

{ Z1K1: 'Z7', Z7K1: F2, K1: 'somestring' }

Now, let's say that Z400/F/F2 all contain an internal Z18 somewhere like

{ Z1K1: 'Z18', Z18K1: 'K1' }

In that case, the two ostensibly identical calls above would have different behavior, depending on the value of K1 that was (or was not) captured from some enclosing scope.

To be fair, this is a corner case, so I'm not strenuously against the idea.

I’m not strenuously in favour!

In practice, we just have objects in context and the need to reference them, so I see it as more of a local-object reference than an explicitly bound argument reference. Where the function’s identity is known, it is useful to reference it and constrain the arguments according to its signature. In such a case, the global key identifier makes more sense to me. Currently, however, we don’t constrain the arguments to an applied function, so the local identifier is more appropriate.

So, really, it’s not about “falling back”, as such, so much as codifying the user interaction. In the case where the user specifies the argument’s type, it is a local key, and where the system determines the required type, it is a global key, pointing back to the pertinent Z17K1/type. I think this amounts to a simple statement of fact: whether any identifiable Z17K1/type was referenced in the argument binding at all and, if so, which one (Z17K2). Doesn’t that mean the variadic functions themselves would be agnostic on this question and just accept a Z8K1-list (or a list of Z17-lists, perhaps)?

Doesn’t that mean the variadic functions themselves would be agnostic on this question and just accept a Z8K1-list (or a list of Z17-lists, perhaps)?

[Not in this universe… Apologies for the confusion!]
Perhaps the nearest equivalent is that each ArgumentList is a [Z39, Z1] typed pair.