Page MenuHomePhabricator

Investigate wbformatvalue behaviour regarding datatype and datavalue type
Closed, ResolvedPublic3 Story Points

Description

In T207479 some traces were identified showing exceptions.

During the task breakdown we decided the ticket probably needed a different fix than just catching the exceptions.

SPIKE 4 hours

There are various combinations that need to be checked

  • datavalue type, no datatype
  • matching types
  • mismatching types

Conclusions
For values that look like entityids the datavalue type should be wikibase-entityid

  • Everything works to plan when the value is an entityid and matching datatype is supplied
  • For mismatched types:
    • If the datatype is unknown returns unknown datatype
    • If the datatype is one of monolingualtext quantity string time external-id wikibase-item wikibase-property and id is item or property return nice human link to entity
    • If the datatype is one of monolingualtext quantity string time external-id wikibase-item wikibase-property and id is lexeme (sub) id return link only to lexeme entity and no human readable string
    • If the datatype is one of url commonsMedia geo-shape tabular-data returns internal_api_error_InvalidArgumentException
    • If the datatype is wikibase-lexeme but id is not lexeme id returns internal_api_error_InvalidArgumentException
    • If the datatype is wikibase-form but existent lexeme or sense id provided causes fatal error
    • If the datatype is wikibase-form and non-existent sense id provided return link to deleted sense
    • If the datatype is wikibase-form and non-existent lexeme id provided returns internal_api_error_InvalidArgumentException
    • If the datatype is wikibase-sense and a non-sense id is provided causes fatal error

Event Timeline

Addshore triaged this task as Normal priority.Nov 13 2018, 2:54 PM
Addshore created this task.
Addshore renamed this task from Investigate wbformatvalue behaviour when datatype isn't provided to Investigate wbformatvalue behaviour regarding datatype and datavalue type.Nov 13 2018, 2:57 PM
Addshore added a project: Spike.
Addshore updated the task description. (Show Details)

Some examples of current behavior:

  • form/sense ID value, lexeme datatype: InvalidArgumentException
  • form/sense ID value, sense/form datatype: PHP fatal error
  • lexeme ID value, sense/form datatype: PHP fatal error
  • lexeme/sense/form ID value, item datatype: generic link (lexeme target, ID text)
  • lexeme ID value, string datatype: generic link
  • lexeme ID value, url datatype: InvalidArgumentException

It’s a mess.

Tarrow added a comment.EditedNov 20 2018, 10:20 AM

I've tested various combinations of the possible entries in the "datavalue" parameter. i.e. varying the specified type and the id string datavalue=={"type":"$TYPE", "value": {"id":"$VALUE"}} while keeping the "datatype" parameter fixed as wikibase-item

with the following possible values:

TYPES="bad globecoordinate monolingualtext quantity string url commonsMedia geo-shape tabular-data time wikibase-entityid wikibase-unmapped-entityid external-id wikibase-lexeme wikibase-form wikibase-sense wikibase-item wikibase-property"

VALUES="L10-F3 L10-S3 L10 P10 Q100"

Every combination except $TYPE == wikibase-entityid resulted in a internal_api_error_InvalidArgumentException

which resulted in the following:

Trying value: L10-F3 and type: wikibase-entityid {"result":"<a title=\"Lexeme:L10\" href=\"/wiki/Lexeme:L10\">L10-F3</a>"}
Trying value: L10-S3 and type: wikibase-entityid {"result":"<a title=\"Lexeme:L10\" href=\"/wiki/Lexeme:L10\">L10-S3</a>"}
Trying value: L10 and type: wikibase-entityid {"result":"<a title=\"Lexeme:L10\" href=\"/wiki/Lexeme:L10\">L10</a>"}
Trying value: P10 and type: wikibase-entityid {"result":"<a title=\"Property:P10\" href=\"/wiki/Property:P10\">video</a>"}
Trying value: Q100 and type: wikibase-entityid {"result":"<a title=\"Q100\" href=\"/wiki/Q100\">Boston</a>"}

Totally omitting the type field in the json results in: DataValue type is missing errors.

If we now fix "type" in the datavalue JSON blob to wikibase-entityid and instead vary the type in the "datatype" url parameter we see the following results.

Setting the datatype parameter to wikibase-item or wikibase-property generally results in sensible response no matter whether or not the provided id actually is a property or item id however the text of the link is only "human readable" for properties and items.

See the following:

Trying value: L10-F3 and type: wikibase-item {"result":"<a title=\"Lexeme:L10\" href=\"/wiki/Lexeme:L10\">L10-F3</a>"}
Trying value: L10-S3 and type: wikibase-item {"result":"<a title=\"Lexeme:L10\" href=\"/wiki/Lexeme:L10\">L10-S3</a>"}
Trying value: L10 and type: wikibase-item {"result":"<a title=\"Lexeme:L10\" href=\"/wiki/Lexeme:L10\">L10</a>"}
Trying value: P10 and type: wikibase-item {"result":"<a title=\"Property:P10\" href=\"/wiki/Property:P10\">video</a>"}
Trying value: Q100 and type: wikibase-item {"result":"<a title=\"Q100\" href=\"/wiki/Q100\">Boston</a>"}

Trying value: L10-F3 and type: wikibase-property {"result":"<a title=\"Lexeme:L10\" href=\"/wiki/Lexeme:L10\">L10-F3</a>"}
Trying value: L10-S3 and type: wikibase-property {"result":"<a title=\"Lexeme:L10\" href=\"/wiki/Lexeme:L10\">L10-S3</a>"}
Trying value: L10 and type: wikibase-property {"result":"<a title=\"Lexeme:L10\" href=\"/wiki/Lexeme:L10\">L10</a>"}
Trying value: P10 and type: wikibase-property {"result":"<a title=\"Property:P10\" href=\"/wiki/Property:P10\">video</a>"}
Trying value: Q100 and type: wikibase-property {"result":"<a title=\"Q100\" href=\"/wiki/Q100\">Boston</a>"}

Valid JSON responses are returned but with errors with code unknown_datatype for the following datatypes:

  • bad
  • globecoordinate
  • wikibase-entityid
  • wikibase-unmapped-entityid

An example response is:

{"error":{"code":"unknown_datatype","info":"Unrecognized value for parameter \"datatype\": wikibase-entityid.","*":"See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes."},"servedby":"mw1348"}

This error is also thrown for a value such as foobarwrongtype

Valid JSON responses are returned but with errors with code internal_api_error_InvalidArgumentException for the following datatypes:

  • url
  • commonsMedia
  • geo-shape
  • tabular-data

This error is also thrown for datatype=wikibase-lexeme when the id to be formatted is not equal to a valid lexemeid.

An example response is:

Trying value: L10-F3 and type: wikibase-lexeme {"error":{"code":"internal_api_error_InvalidArgumentException","info":"[W-PURwpAIDsAAIzIQboAAADF] Caught exception of type InvalidArgumentException"},"servedby":"mw1347"}

The WMF error page is shown when datatype=wikibase-form or wikibase-sense but the id value string is not a valid form or sense id. Details of difference in following comment

Finally omitting the datatype parameter results in a 'correct' response but still with nice huan readable text only for items and properties:

Trying value: L10-F3 and type: {"result":"<a title=\"Lexeme:L10\" href=\"/wiki/Lexeme:L10\">L10-F3</a>"}
Trying value: L10-S3 and type: {"result":"<a title=\"Lexeme:L10\" href=\"/wiki/Lexeme:L10\">L10-S3</a>"}
Trying value: L10 and type: {"result":"<a title=\"Lexeme:L10\" href=\"/wiki/Lexeme:L10\">L10</a>"}
Trying value: P10 and type: {"result":"<a title=\"Property:P10\" href=\"/wiki/Property:P10\">video</a>"}
Trying value: Q100 and type: {"result":"<a title=\"Q100\" href=\"/wiki/Q100\">Boston</a>"}

The pattern of success but with generic links for all but property and item also occurs with datatype parameter set to the following:

  • monolingualtext
  • quantity
  • string
  • time
  • external-id

"Perfect" links are only returned for forms when the datatype is explicit specified as such and the id is valid e.g.
value: L10-F3 and type: wikibase-form {"result":"<a href=\"/wiki/Lexeme:L10#F3\">describing</a>"}

A perfect link is also displayed when wikibase-form is passed as the datatype with a deleted sense id e.g.
value: L10-S3 and type: wikibase-form {"result":"L10-S3 <span class=\"wb-entity-undefinedinfo\">(Deleted Sense)</span>"}

However if a valid (existing) sense id is passed with datatype=wikibase-form then a WMF error page is shown with the following error:
Call to undefined method Wikibase\Lexeme\Domain\Model\Sense::getRepresentations()

A valid sense with datatype=wikibase-sense results in success.

passing a deleted form ID results in a WMF error page.

This is currently in Peer Review, but I’m not sure what’s to review here. Do we need to decide what to do next, given the results listed in the task description?

Yes, I think the next step should be deciding/defining the expected behavior in these situations. Does this API have documentation from which we could infer the expected behavior? Without knowing the use-cases and tools, having two type definitions in the request seems somewhat redundant?

It’s not completely redundant, because there are more data types than data value types. For example, the data types “string”, “external identifier”, “URL”, “Commons media”, and a few more all share the data value type “string”, and the data types “item”, “property”, “lexeme”, “sense” and “form” all share the data value type “Wikibase entity ID”. But they’re not independent either, and in fact for entity IDs it’s always possible to infer the data type from the data value, since entity IDs of different entity types have different formats.

[...] there are more data types than data value types. [...]

When do we need the "data value type"? Are there data types that could reasonably have multiple data value types?

No, but I think the data value is supposed to be useful on its own as well, that’s why it has its own type information.

So after reading this investigation and trying to look at the root inconsistencies I see a few things.

The formatting either:

  • happens correctly, and we get something formatted
  • the ValueFormatter that is used returns a FormattingException or InvalidArgumentExceptions
  • an EntityIdFormatter that is wrapped in a EntityIdValueFormatter fatals due to typehints (the new case we are seeing with out lexeme code)

The EntityIdFormatter interface currently doesn't say what should happen in the situation that a bad value is passed to it, and it can not be formatted.
Conventions within wikibase probably dictate that this should be an InvalidArgumentException.

The EntityIdValueFormatter already has a small bit of code checking that the value passed is roughly the correct type:

		if ( !( $value instanceof EntityIdValue ) ) {
			throw new InvalidArgumentException( 'Data value type mismatch. Expected an EntityIdValue.' );
		}

before trying to call ->format with the internal formatter.

So if the EntityIdFormatter implementations actually threw InvalidArgumentExceptions when they know they are being given things they can't handle we would have 1 exception constantly bubbling up to deal with in the API rather than any fatals etc.

It would probably make sense for the ValueFormatter interface and the EntityIdFormatter interface to both throw their own flavour of InvalidArgumentException (although this might be overkill).
EntityIdValueFormatter could then catch one flavour and rethrow in the correct flavour (could even be a FormattingException?)
Or we just make these InvalidArgumentExceptions consistant, and just catch them in the API execute method.

Thoughts?

Thought: if we say that any subclass of InvalidArgumentException is okay to be thrown by value formatters, then value formatters can use the Wikimedia\Assert library if they want.