Page MenuHomePhabricator

Token with (automaticially created) entities breaks the entities when edit and save
Closed, ResolvedPublic

Description

MW version 1.34.2
page forms version: 4.9.5

When saving a page form containing a property with tokens, some characters are automatically replaced with entities, such as Ø -> Ø . When the item is edited and saved again, the ampersand is now expanded to Ø breaking the first entity.

Here is a minimal reproduction of the issue on the Semantic Mediawiki sandbox.

Form: https://sandbox.semantic-mediawiki.org/w/index.php?title=Formulaire:Example_token_entities_form_bug
Edited Item: https://sandbox.semantic-mediawiki.org/wiki/Page_forms_token_error (Page_forms_token_error)

Example

From form create a new Item and add to property Example_tokens: øysters.
The form saves succesfully and the template looks like this if you visit the source tab:

{{Example token entities
|example_tokens=øysters
}}

Go to form, write page title, click edit or create. Click save page.

Output is saved as

{{Example token entities
|example_tokens=øysters
}}

The & in ø now has been replaced with & invalidating the html entity which was created on first save.

Event Timeline

Same problem on my wiki with German umlauts (ä,ö,ü).
Before saving the property value was Ernährung (Ernährung), after saving it is Ernährung.
In my case it's a property of type "page" and I use "arraymap" to set it.

Also running MW 1.34.2 and PageForms 4.9.5

I can also confirm this. MW 1.31, PF 4.9.5.

You enter "für" in a tokens field (from an autocompleted value), it gets saved as "für" in the page wiki text.

I think it was introduced in https://github.com/wikimedia/mediawiki-extensions-PageForms/commit/1b67fc6ced342237efe227af2ccab9a93272c88a which was related to T259433

If I change the line to this where I first call html_entity_decode

return htmlentities( html_entity_decode( implode( "$delimiter ", $value ) ) );

it saves without doubling on entities.
Does this have any unintended consequences?

For SMW: When I try to export as RDF, it looks like the value with the html entity is understood as the correct page/uri, so I don't think saving the value using entities in the wikitext is an issue?

Probably better to set the last boolean parameter double_encode to false:

return htmlentities( implode( "$delimiter ", $value ),ENT_COMPAT | ENT_HTML401 , ini_get( "default_charset"), false );

Is there a a way to use the default values for param $2 and $3 without adding them back in?

Sorry about the problem! I checked in what I think is a fix for this, a few hours ago. @Oeyvindg - it was indeed that line; I made a different fix, but hopefully it accomplishes the same thing.

Oeyvindg claimed this task.

Thank you, have updated and it works.

Fixed it for my wiki too, thanks a lot!