Page MenuHomePhabricator

String objects in (synthetic) Wikidata references can be interpreted as a Typed Map
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Please see Z24094
  • Note the failing implementation
  • Click details

What happens?:
The Wikidata item reference appears to contain a Typed Map rather than the expected string.

What should have happened instead?:
The expected string should be returned and the test should pass.

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

This is an artificial recreation of a problem that has since been fixed (or worked around).
Implementation Z23421 returns a (synthetic) Wikidata item reference object containing a String object. This is presumably valid and the implementation passes all the function’s test cases.
When the function was used in composition Z24043, the composition failed with a mysterious error.

IMG_1231.png (2×960 px, 480 KB)

Alternative implementations for the inner function were created, allowing the composition to succeed.
Further investigation revealed that the Python implementation of Z20041 would also fail if asked to return the string within the Wikidata item reference returned by Z23419 (although it, too, was passing all its test cases). This is because the object being returned was interpreted as a Typed Map rather than a String. This too was fixed (by returning the Z6K1).
Sandbox functions have been used to emulate the implementations as they were before being fixed.

  • Sandbox JavaScript emulates Z23419, returning a synthesised Wikidata item reference object
  • Sandbox composition emulates Z24041, with compositions calling either the emulated JavaScript function (Z24074) or the corrected function that it emulates (Z24041).
  • Sandbox Python emulates Z20041. Called with a value returned from Sandbox JavaScript, it returns a Typed Map.

IMG_1232.png (2×960 px, 164 KB)

It seems improper that a String object should ever be interpreted as a Typed Map, particularly in a context that should allow only objects that can resolve to strings (or references).

Event Timeline

Jdforrester-WMF subscribed.

Sounds like a Python-specific executor bug. To investigate.

Hello! The problem is actually in the JS implementation of Sandbox Javascript, which is called in the argument to the call linked in the bug.

The code reads, partly,

	return {
		"Z1K1": {
			"Z1K1": "Z9",
			"Z9K1": "Z6091"
		},
		"Z6091K1": {
			"Z1K1": {
				"Z1K1": "Z9",
				"Z9K1": "Z6"
			},
			"Z6K1": list[Z23419K1]
		}
	}

However, strings are handled by default type conversion. This implementation should instead read

	return new ZObject(
		"Z1K1": {
			"Z1K1": "Z9",
			"Z9K1": "Z6091"
		},
		"Z6091K1": list[Z23419K1]
	}

That should fix the issue.

As a side note,

{
  "Z1K1": { "Z1K1": "Z9", "Z9K1": "Z6" },
  "Z6K1": <something>
}

is never a valid normal-form string. Z6/Strings should only look like

{
  "Z1K1": "Z6",
  "Z6K1": "string contents"
}

As a side note,

{
  "Z1K1": { "Z1K1": "Z9", "Z9K1": "Z6" },
  "Z6K1": <something>
}

is never a valid normal-form string. Z6/Strings should only look like

{
  "Z1K1": "Z6",
  "Z6K1": "string contents"
}

Thanks — that’s what I thought, and it’s how we ended up fixing it (once we’d managed to track it down)

Appreciate the clarification that “presumably correct” in the description should really be “technically incorrect”. I’ve found a few more examples here but it looks like we can nip this one in the bud.

Ah! Apologies, I hadn't seen the fix.

I also see this as a larger class of issues stemming from the lack of documentation on type conversion. This is mostly my fault, since I made a lot of decisions around this stuff during implementation and didn't communicate them well. I'll take this as another cue to do my due diligence 😄.

Appreciate the clarification that “presumably correct” in the description should really be “technically incorrect”. I’ve found a few more examples here but it looks like we can nip this one in the bud.

I've tried to clean up those that I could. A couple of the remaining ones are exemplars, but a couple are ZObject returns that I don't understand.

Appreciate the clarification that “presumably correct” in the description should really be “technically incorrect”. I’ve found a few more examples here but it looks like we can nip this one in the bud.

I've tried to clean up those that I could. A couple of the remaining ones are exemplars, but a couple are ZObject returns that I don't understand.

Same here… and I wrote them! I’ve removed the commented-out section from one. The other, Z22772 (disconnected) should perhaps be raised as a separate bug. The intention was to return a string object rather than a string, since the string would be a canonical reference and is de-referenced. It only half worked, because the string got de-referenced when the function was used in a composition. It now returns a string with a lowercase "z". If amended, it will still return a string like "Z40" in Try this implementation (edit mode) but tests never complete (and they never have).

Appreciate the clarification that “presumably correct” in the description should really be “technically incorrect”. I’ve found a few more examples here but it looks like we can nip this one in the bud.

I've tried to clean up those that I could. A couple of the remaining ones are exemplars, but a couple are ZObject returns that I don't understand.

Same here… and I wrote them! I’ve removed the commented-out section from one. The other, Z22772 (disconnected) should perhaps be raised as a separate bug. The intention was to return a string object rather than a string, since the string would be a canonical reference and is de-referenced. It only half worked, because the string got de-referenced when the function was used in a composition. It now returns a string with a lowercase "z". If amended, it will still return a string like "Z40" in Try this implementation (edit mode) but tests never complete (and they never have).

Ah, in this case, we'd still treat something like Z10000 as a string in the executor. The executors don't operate over canonical form but normal form--with the exception that there is some type conversion magic for certain literals.

I realize that there's actually no way to return a reference from the executors. If you did something like return { "Z1K1": "Z9", "Z9K1": "Z10000" }, you'd get a Z883()/Typed dict as the result. Maybe that's the core bug here (in addition to the noted need for better documentation).

[…snip…]

Thanks, Cory. I’m happy for you to diagnose the bug that way… or not. We’re just calling Z883 “Typed Map” at the moment, hence the Bug report’s title.
The offending implementation’s function is only currently used to support search-term construction, like String for function signature search (Z22973), which is why I opted for the lower-case z option. I’m happy to adjust to some other convention, but I believe extending the Wikidata reference approach is the best route forward.
@99of9 I’m also happy to remove the direct object return from the current version of Z22772, but I’d like to retain the implementation for now, even though Z22765 supersedes it.

[…snip…]

Thanks, Cory. I’m happy for you to diagnose the bug that way… or not. We’re just calling Z883 “Typed Map” at the moment, hence the Bug report’s title.
The offending implementation’s function is only currently used to support search-term construction, like String for function signature search (Z22973), which is why I opted for the lower-case z option. I’m happy to adjust to some other convention, but I believe extending the Wikidata reference approach is the best route forward.
@99of9 I’m also happy to remove the direct object return from the current version of Z22772, but I’d like to retain the implementation for now, even though Z22765 supersedes it.

The linked type proposal is good. We've had informal discussions about things like this, to solve simultaneously 1) the amount of "magic" involved with Z9s and 2) the type ambiguity of bare Z9s. I hope we'll do something like that at some point, but it's a huge change, so unlikely to happen quickly :-/.

Would it be useful if the executors allowed for something similar to the support we currently have for ZPair? For example, would a syntax like

def  Z1XXXX():
    return ZReference('Z1YYYY')

be appealing and/or helpful?

[…snip…]

[…snip…]

The linked type proposal is good. We've had informal discussions about things like this, to solve simultaneously 1) the amount of "magic" involved with Z9s and 2) the type ambiguity of bare Z9s. I hope we'll do something like that at some point, but it's a huge change, so unlikely to happen quickly :-/.

Would it be useful if the executors allowed for something similar to the support we currently have for ZPair? For example, would a syntax like

def  Z1XXXX():
    return ZReference('Z1YYYY')

be appealing and/or helpful?

To add to your woes, I don’t believe the “pair” support is documented. Someone even created a make pair (Z17534) function but only allowed for the types, not the paired objects. I’ve not tried to do anything about it because Typed Pair objects are not officially supported yet.
I think the same applies to Z9/references, but if they behave themselves properly in relation to Z803, I’m happy to experiment with returning them from code in some manner such as you propose (thank you).
Thank you for taking a look at the Type Proposal. Whilst I understand that fixing Z9s once and for all is a significant change, I think we could make good use of a wrapped ZID when the time comes to extend light-weight enumerations to Wikifunctions objects. In the meantime, we can continue to smuggle them around in cases where exposing the Z9K1 might lead to spontaneous de-referencing (assuming you make ZReference available).

DSantamaria changed the task status from Open to In Progress.Apr 29 2025, 10:34 AM

I'm taking care of the documentation now.

I have also realized, with some chagrin, that the ZReference syntax already exists, so you can start adapting existing implementations now.

[…snip…]

[…snip…]

The linked type proposal is good. We've had informal discussions about things like this, to solve simultaneously 1) the amount of "magic" involved with Z9s and 2) the type ambiguity of bare Z9s. I hope we'll do something like that at some point, but it's a huge change, so unlikely to happen quickly :-/.

Would it be useful if the executors allowed for something similar to the support we currently have for ZPair? For example, would a syntax like

def  Z1XXXX():
    return ZReference('Z1YYYY')

be appealing and/or helpful?

To add to your woes, I don’t believe the “pair” support is documented. Someone even created a make pair (Z17534) function but only allowed for the types, not the paired objects. I’ve not tried to do anything about it because Typed Pair objects are not officially supported yet.
I think the same applies to Z9/references, but if they behave themselves properly in relation to Z803, I’m happy to experiment with returning them from code in some manner such as you propose (thank you).
Thank you for taking a look at the Type Proposal. Whilst I understand that fixing Z9s once and for all is a significant change, I think we could make good use of a wrapped ZID when the time comes to extend light-weight enumerations to Wikifunctions objects. In the meantime, we can continue to smuggle them around in cases where exposing the Z9K1 might lead to spontaneous de-referencing (assuming you make ZReference available).

On "make pair": that looks like a manifestation of a separate bug which causes issues when returning Z4/Types from a top-level function. We're investigating that separately.

For the present issue: documentation is underway here. Still working on it, but it hopefully covers this case.

@GrounderUK, is there more to address on this particular task, or are you okay with us closing it?

Thanks for the documentation, that looks useful. At the end it mentions ZPair and dict. Are those considered supported yet, or should we still hold off on writing functions using such features?

Those should be supported! If you're finding cases where they don't work, that's a bug :)

Well, let me qualify: Z882(x, y) (Pair types) and Z883(x, y) (Map types) are differently supported in different parts of the system. If you write Python/JS code that ingests or produces these types, that should work in the evaluator and orchestrator. The interface should also handle them. I'm not sure on the state of these things in other parts of the system. For example, I'm not sure how easy it is to define a function directly that takes these types as input. But as intermediate values to be used within compositions, they should be perfectly usable.

I had a quick look at the existing attempts, and it seems like https://www.wikifunctions.org/view/en/Z17534 is not yet working well enough to base things on.

The problem with Z17534 is that it's trying to return a Type directly, which has some problems (addressed in a separate Phabricator task). Is there a function that tries to return an instance of a pair type (rather than the pair type itself)?

I've made a new task to capture the "make pair" issue (which is more related to types as top-level return objects): https://phabricator.wikimedia.org/T394313.

I'm closing this task. However, I know that there are still lots of issues with type converters. Please feel free to open more tasks as problems arise 😄 .