Page MenuHomePhabricator

Rethink wikilambdasearch_labels ActionAPI so that it allows complex type/return_type combinations
Closed, ResolvedPublic

Assigned To
Authored By
gengh
May 19 2025, 5:22 PM
Referenced Files
F60675906: use-case-4_type-Z8.gif
May 27 2025, 12:26 PM
F60675909: use-case-4_type-Z60.gif
May 27 2025, 12:26 PM
F60675775: use-case-4_type-and-enums.gif
May 27 2025, 12:26 PM
F60675650: use-case-3_function-type-bound.gif
May 27 2025, 12:26 PM
F60675618: use-case-2_function-type-unbound.gif
May 27 2025, 12:26 PM
F60675143: use-case-1_function-editor.gif
May 27 2025, 12:26 PM
Subscribers

Description

Description

Currently type and return_type have rather strict restrictions:

  • only one value per request
  • they are combined using OR operator

https://www.mediawiki.org/wiki/Extension:WikiLambda/API#wikilambdasearch_labels

However, with persisted Function Calls we have more complex requirements for this API.
These are the use cases which we should be requiring different search patterns:

use caseexpectallowed return typeswhat to do with resultquery bymanually exclude
1. function input or function outputanything that resolves to a Z4 (strict)Z4, Z8, Z7Z4 and Z7: create reference, Z8: create function call(type = Z4 OR return_type = Z4)Z6884
2. function call function (Z7K1) parent type openany functionZ8create reference(type = Z8)Z6884 (unless top level)
3. function call function (Z7K1) parent type boundany function that returns parentType (not strict)Z8create reference(type = Z8) AND (return_type = parentType OR return_type = Z1)Z6884 (unless top level)
4. type referenceanything that can be used as a Z4Z4, Z7create reference(type = Z4) OR (return_type = Z4 AND type = Z7)

Desired behavior/Acceptance criteria

  • The filter API is sufficiently flexible to adjust to all our use cases without needing multiple calls
  • The UI requests objects according to the requirements, this includes:
    • TypeSelector component
    • ZReference component

An example of current labels table

wlzl_zobject_zidwlzl_typewlzl_return_typewlzl_labeletc...
Z11Z4NULLa normal type
Z12Z4NULLanother normal type
Z801Z8Z1a function that returns anything
Z881Z8Z4a function that returns type
Z866Z8Z40a function that returns boolean
Z6884Z8Z4enum function, returns type
Z10000Z7Z4one persisted enum
Z10002Z7Z4another persisted enum
  • use case 1: If I want to select a type for function input or output: Z11, Z12, Z881, Z10000, Z10001, Z6884*
  • use case 2: If I want to select a function for a function call Z7K1, and parent type can be anything: Z801, Z881, Z866, Z6884*
  • use case 3: If I want to select a function for a function call Z7K1, and parent type is bound to Z4: Z801, Z881, Z6884*
  • use case 4: If I want to select a type for Z1K1 or for Z3K1: Z11, Z12, Z10000, Z10002

(zids marked with * will be manually excluded from results, but the query is expected to return them)


Options

DECISION: Option 3 to enable Option 1.a

Option 1.a
  1. edit secondary table’s details, add return_type to non-function and on-functioncall objects
  2. change type and return_type to be lists of strings
  3. while building the query, aggregate type and return_type conditions with an AND operator

use case 1:

  • type=null
  • return_type=Z4
  • build sql condition: WHERE return_type=‘Z4’
  • matches objects: Z11, Z12, Z881, Z6884, Z10000, Z10002

use case 2 (return type unbound):

  • type=Z8
  • return_type=null
  • build sql condition: WHERE type='Z8'
  • matches objects: Z801, Z881, Z866, Z6884

use case 3 (return type bound):

  • type=Z8
  • return_type=[Z4, Z1]
  • build sql condition: WHERE type='Z8' AND return_type IN ['Z4', 'Z1']
  • matches objects: Z801, Z881, Z6884

use case 4:

  • type=[Z4, Z7]
  • return_type=Z4
  • build sql condition: WHERE type IN [ 'Z4', 'Z7' ] AND return_type='Z4'
  • matches objects: Z11, Z12, Z10000, Z10002

Pros and cons:

  • ❌ We’d need to run secondary updates for all objects in production
    • Alternative: We can build a script that just iterates through the labels table and: if return_type is NULL, set type value
    • Alternative: We can avoid this by adding some exeption logic to the query building system: Option 1.b
  • ✅ Pretty simple changes to API props:
    • Make them lists of strings, to be able to do type IN [ 'Z4', 'Z7' ] or return_type IN ['Z4', 'Z1']
    • We could remove strict property, which has always been a bit confusing

Option 1.b
  1. aggregate type and return_type with an AND operator
  2. aggregate values inside the array with an OR operator
  3. exception if return_type is present but type is not, add OR type = return_type to the query:
    • e.g. type=Z8 and return_type=Z4 would build SELECT * WHERE type='Z8' AND return_type='Z4' <-- all parameters present, aggregate conditions with AND
      • e.g. type=null and return_type=Z4 would build SELECT * WHERE return_type='Z4' OR type='Z4' <-- type not present, duplicate the return_type and aggregate with OR
  4. exception if type has values that are not Z8 or Z7 and return_type is present, we need to add OR return_type IS NULL to the return type condition
  5. doesn't require any changes on the database content

use case 1:

  • type=null
  • return_type=Z4
  • build sql condition: WHERE return_type=‘Z4’ OR type='Z4' <-- exception when type is missing
  • matches objects: Z11, Z12, Z881, Z6884, Z10000, Z10002

use case 2 (return type unbound):
same as Option 1.a

use case 3 (return type bound):
same as Option 1.a

use case 4:

  • type=[Z4, Z7]
  • return_type=Z4
  • build sql condition: WHERE ( type='Z4' OR type='Z7' ) AND ( return_type='Z4' OR return_type IS NULL )
  • matches objects: Z11, Z12, Z10000, Z10002

Pros and cons:

  • ✅ No database updates necessary
  • ❌ The behavior when type is not there is a bit unpredictable, and needs good documentation
  • ✅ Pretty simple changes to API props:
    • Make them lists of strings
    • We could remove strict property, which has always been a bit confusing
  • ❌ There are a number of exceptions in the query building logic, which makes it more prone to bugs and confusion
    • We need to add OR return_type IS NULL whenever any of the requested types is anything else than a Z7 or Z8, and requested return_type has values.

Option 1.c
  1. change type and return_type to be lists of strings
  2. add new parameter operator (defaults to AND)
  3. when building the query, combine type and return_type conditions with the operator requested by the new parameter

use case 1:

  • type=Z4
  • return_type=Z4
  • operator=OR
  • build sql condition: WHERE return_type=‘Z4’ OR type='Z4'
  • matches objects: Z11, Z12, Z881, Z6884, Z10000, Z10002

use case 2 (return type unbound):
same as Option 1.a

use case 3 (return type bound):
same as Option 1.a

use case 4:

  • type=[ Z7, Z4 ]
  • return_type=null
  • operator=OR
  • build sql condition: WHERE type IN [ 'Z7', 'Z4' ]
  • matches objects: Z11, Z12, Z10000, Z10002

Pros and cons:

  • ✅ No database updates necessary
  • ✅ Pretty simple changes to API props:
    • Make them lists of strings
    • Add operator, which defaults to AND
    • We could remove strict property, which has always been a bit confusing
  • ❌ Depends on the assumption that a persisted Z7 will ALWAYS return a Z4

Option 2
  1. Build an advanced filtering system with complex expressions passed as properties
  2. Make no changes to the data represented in the secondary table
  3. Each caller should know how to build the right filter_expression property

use case 1: function input/output types

filter_expression = {
  "or": [
    { "type": "Z4" },
    { "return_type": "Z4" }
  ]
}
  • builds sql condition: WHERE type='Z4' OR return_type='Z4'
  • matches objects: Z11, Z12, Z881, Z6884, Z10000, Z10002

use case 2: function call to function (return type unbound)

filter_expression = { "type": "Z8" }
  • builds sql condition: WHERE type='Z8'
  • matches objects: Z801, Z881, Z866, Z6884

use case 3: function call to function (return type bound)

filter_expression = {
  "and": [
    { "type": "Z8" },
    {
      "or": [
        { "return_type": "Z4" },
        { "return_type": "Z1" }
      ]
    }
  ]
}
  • builds sql condition: WHERE type='Z8' AND ( return_type='Z4' OR return_type='Z1' )
  • matches objects: Z801, Z881, Z6884

use case 4: persisted type or type expression

filter_expression = {
  "or": [
    { "type": "Z4" },
    {
      "and": [
        { "type": "Z7" },
        { "return_type": "Z4" }
      ]
    }
  ]
}
  • builds sql condition: WHERE type='Z4' OR ( type='Z7' AND return_type='Z4' )
  • matches objects: Z11, Z12, Z10000, Z10002

Pros and cons:

  • ❌ Not sure how ActionAPI can allow this kind of complex parameter
    • Could be a JSON encoded as a string
    • Could just be an advanced property, but keep type and return_type as they are for backwards compatibility.
      • ❓would this be justified? this is an internal API only used by the frontend lookup components, so we can make changes to the signature if we wish
  • ✅ This system is infinitely flexible
  • ✅ Requires no changes to the database
  • ❌ Complexity to use the API (and the ApiSandbox) increases significantly

Option 3
  1. We stop using labels table return_type, and instead build this column dynamically by a JOIN with wikilambda_zobject_join
  2. We need to add return type of persisted Z7s in the wikilambda_zobject_join table
  3. But this join would fix the issues from Option 1.b. (NULL values), and we could use the parameters easily as described in Option 1.a.

Pros and cons:

  • ❌ Some work on the querying side
    • The query is pretty simple, just requires one subquery + join
    • All necessary fields are already indexed, no performance drawbacks
    • Then we can implement "Option 1" solution using the new resolves_to column instead of return_type
  • ✅ We get rid of data duplication added by return_type column
  • ✅ Requires little changes to the parameters, just changing from string to list of strings

Option 4
  1. When persisting a Z7 which resolves to a Z4, instead of storing type=Z7 and return_type=Z4, store its labels as if it were a Z4, so:
    • type=Z4 and return_type=NULL

Pros and cons:

  • ✅ No changes needed in the API
  • ❌ Function calls that resolve to a certain type are indistinguishable from objects of that literal type
    • These are unsearchable in special list pages
    • We break the assumption that whatever type registered in this column describes the literal syntax of the object, this has unknown and potentially dangerous ramifications for the php layer, which strongly depends on assumptions on the structure of a persisted object

Completion checklist

Event Timeline

gengh updated the task description. (Show Details)
gengh changed the task status from Open to In Progress.May 20 2025, 11:10 AM
gengh moved this task from To Triage to 25Q4 (Apr–Jun) on the Abstract Wikipedia team board.
gengh added a project: OKR-Work.
gengh updated the task description. (Show Details)

Change #1149452 had a related patch set uploaded (by Genoveva Galarza; author: Genoveva Galarza):

[mediawiki/extensions/WikiLambda@master] [WIP] Fix label search so that it can aggregate return types from wikilambda_zobject_join

https://gerrit.wikimedia.org/r/1149452

Use case 1:

  • function editor page,
  • select input or output types,
  • types, function calls and functions are shown,
  • because multiple types might be available, we show icon "instance" for those options for which we'll create a reference, and "function call" for those options for which we'll create a function call (functions)

use-case-1_function-editor.gif (682×612 px, 235 KB)

Use case 2:

  • for the key "function" we select a function but the parent key (Z2K2) has no bound type
  • any options are available to pick
  • all returned objects are functions, so no showing icons

use-case-2_function-type-unbound.gif (403×650 px, 304 KB)

Use case 3:

  • for the key "function" we select a function but the parent key (Z1K1) is bound to type Z4
  • only functions that return Z4 are available to pick (e.g. no echo, no join, but typed list, typed map or typed enum)
  • all returned objects are functions, so no showing icons

use-case-3_function-type-bound.gif (403×650 px, 388 KB)

Use case 4:

  • key is type bound (Z1K1, needs Z4)
  • only Z4s and Z7s that return Z4s are available
  • because zids belong to multiple types, we show icons in the selector

use-case-4_type-and-enums.gif (506×650 px, 304 KB)

use-case-4_type-Z60.gif (393×650 px, 528 KB)

use-case-4_type-Z8.gif (393×650 px, 257 KB)

Change #1149452 merged by jenkins-bot:

[mediawiki/extensions/WikiLambda@master] Fix label search so that it can aggregate return types from wikilambda_zobject_join

https://gerrit.wikimedia.org/r/1149452