Page MenuHomePhabricator

Token with (automaticially created) entities breaks the entities when edit and save
Closed, ResolvedPublic

Description

MW version 1.34.2
page forms version: 4.9.5

When saving a page form containing a property with tokens, some characters are automatically replaced with entities, such as Ø -> Ø . When the item is edited and saved again, the ampersand is now expanded to Ø breaking the first entity.

Here is a minimal reproduction of the issue on the Semantic Mediawiki sandbox.

Form: https://sandbox.semantic-mediawiki.org/w/index.php?title=Formulaire:Example_token_entities_form_bug
Edited Item: https://sandbox.semantic-mediawiki.org/wiki/Page_forms_token_error (Page_forms_token_error)

Example

From form create a new Item and add to property Example_tokens: øysters.
The form saves succesfully and the template looks like this if you visit the source tab:

{{Example token entities
|example_tokens=øysters
}}

Go to form, write page title, click edit or create. Click save page.

Output is saved as

{{Example token entities
|example_tokens=øysters
}}

The & in ø now has been replaced with & invalidating the html entity which was created on first save.

Event Timeline

Oeyvindg created this task.Sep 3 2020, 10:54 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 3 2020, 10:54 AM
Oeyvindg updated the task description. (Show Details)Sep 3 2020, 11:02 AM
Stefahn added a subscriber: Stefahn.Sep 7 2020, 2:32 PM

Same problem on my wiki with German umlauts (ä,ö,ü).
Before saving the property value was Ernährung (Ernährung), after saving it is Ernährung.
In my case it's a property of type "page" and I use "arraymap" to set it.

Also running MW 1.34.2 and PageForms 4.9.5

Krabina added a subscriber: Krabina.EditedSep 9 2020, 9:44 AM

I can also confirm this. MW 1.31, PF 4.9.5.

You enter "für" in a tokens field (from an autocompleted value), it gets saved as "für" in the page wiki text.

Oeyvindg added a comment.EditedSep 9 2020, 2:26 PM

I think it was introduced in https://github.com/wikimedia/mediawiki-extensions-PageForms/commit/1b67fc6ced342237efe227af2ccab9a93272c88a which was related to T259433

If I change the line to this where I first call html_entity_decode

return htmlentities( html_entity_decode( implode( "$delimiter ", $value ) ) );

it saves without doubling on entities.
Does this have any unintended consequences?

For SMW: When I try to export as RDF, it looks like the value with the html entity is understood as the correct page/uri, so I don't think saving the value using entities in the wikitext is an issue?

Probably better to set the last boolean parameter double_encode to false:

return htmlentities( implode( "$delimiter ", $value ),ENT_COMPAT | ENT_HTML401 , ini_get( "default_charset"), false );

Is there a a way to use the default values for param $2 and $3 without adding them back in?

Sorry about the problem! I checked in what I think is a fix for this, a few hours ago. @Oeyvindg - it was indeed that line; I made a different fix, but hopefully it accomplishes the same thing.

solved it for me, thank you!

Oeyvindg closed this task as Resolved.Sep 10 2020, 6:33 AM
Oeyvindg claimed this task.

Thank you, have updated and it works.

Fixed it for my wiki too, thanks a lot!