Page MenuHomePhabricator

PF checkboxes do not parse umlaut correctly
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

*https://religion-in-japan.univie.ac.at/Handbuch/Spezial:Abfrage_ausf%C3%BChren/Artist?pfRunQueryFormName=Artist&Artist%5Btime%5D=Neuzeit

  • Select "Dürer; Albrecht"

What happens?:

no-result message

What should have happened instead?:

This page uses PF queryform technology (see also https://religion-in-japan.univie.ac.at/Handbuch/Formular:Artist), it should show a thumbimage and link to an article in namespace 6 (file). If I choose "radiobutton" instead of "checkboxes" in the respective field, the query executes correctly (as do all other values without special characters like umlaut ü) .

Software version (skip for WMF-hosted wikis like Wikipedia):
MediaWiki 1.35.5
PHP 8.0.19 (fpm-fcgi)
MariaDB 10.2.44-MariaDB
Semantic MediaWiki 3.2.3

Other information (browser name/version, screenshots, etc.):

Same problem occurs with vaues that include HTML-characters like ",": radiobutton works, checkboxes does not work; something seems to go wrong when checkboxes try to parse special characters.

There is also a difference in the query link:

  • %26artist_a%3DD%25C3%25BCrer%253B%2BAlbrecht ... (checkboxes) does not work
  • %5Bartist_a%5D=Dürer%3B+Albrecht ... (radiobutton) works

image.png (730×1 px, 48 KB)

image.png (957×1 px, 72 KB)

Event Timeline

The respective query template contains a SMW query such as {{#ask:[[gemacht_von::Dürer]]}}. Under normal conditions, this works. When PF checkboxes parses the value however, Dürer seems to get translated into Dürer and no longer matches to original attribute. In the meantime, I tried to start with "Dürer" from the beginning but this simply reverses the process and does not lead to a match again.

The querylink to my respective form looks (at present, 22/07/21) as follows:
https://religion-in-japan.univie.ac.at/Handbuch/Spezial:Abfrage_ausf%C3%BChren/Artist?title=Spezial%3AAbfrage_ausf%C3%BChren%2FArtist&pfRunQueryFormName=Artist&Artist=time%3DNeuzeit%26artist_a%255Bis_list%255D%3D1%26artist_n%255Bis_list%255D%3D1%26artist_m%255Bis_list%255D%3D1&wpRunQuery=Resultate+anzeigen&pf_free_text=&Artist%5Btime%5D=Neuzeit&Artist%5Bartist_a%5D%5Bis_list%5D=1&Artist%5Bartist_n%5D%5B7%5D=D%26uuml%3Brer%3B+Albrecht+%281%29&Artist%5Bartist_n%5D%5Bis_list%5D=1&Artist%5Bartist_m%5D%5Bis_list%5D=1&wpRunQuery=Resultate+anzeigen&pf_free_text=

Could this be related to T45991?

After some further testing, the problem melts down to the following question: is it possible to prevent PF to parse umlaut from unicode (ü) to HTML encoding (ü) when reading out values from checkboxes. Reason: when used in a SM-query, ü and ü are treated as different values.

Bescheid renamed this task from PF checkboxes do not parse special characters correctly to PF checkboxes do not parse umlaut correctly.Aug 4 2022, 10:32 AM
Bescheid assigned this task to Yaron_Koren.

Sorry for the long delay. I can't reproduce this issue - handling of umlauts (and other non-ASCII characters) looks the same to me in both "checkboxes" and "radiobuttons" (and works fine for both). It looks like you are using a rather old version of Page Forms, 4.9.5 - could that be the issue?

Closing this issue - feel free to re-open if it's still a problem in a recent version of Page Forms.

Bescheid reopened this task as Open.EditedMay 10 2023, 12:26 PM

Thank you for answering. I tried to update PF but since I use MW1.35.5, PF4.9. seems to best I can get.

The problem still exists. If you take the trouble to go to https://religion-in-japan.univie.ac.at/Handbuch/Spezial:Abfrage_ausf%C3%BChren/Artist you will still find "Dürer; Albrecht (1)" in the list. "(1)" indicates that one result is expected but if you select and search "Dürer" you get no result. The interesting thing is that this is not the case with names like "Kaihō Yūshō". I assume, this is due to the fact that "Dürer" is parsed as "Dürer" —i.e. that the HTML-code for German umlaut comes in somewhere, rather than unicode—resulting in a mismatch of "Dürer" and "Dürer".

I assume, the same problem exists for other cases of common HTML-encoding.

Thank you for responding. You are using a version of Page Forms from about three years ago, so it's hard to know whether this is still an issue. Can you just upgrade to the latest version of Page Forms, i.e. get off the REL1_35 branch? The latest Page Forms code will still work with MW 1.35.

Bescheid changed the task status from Open to Stalled.May 10 2023, 12:40 PM

I just tried and got a "fatal error" (I use 1.35 in combination with PHP8, which was risky but otherwise worked well so far). So, lets postpone the issue until I get my whole Wiki updated.

I don't know what the issue is with upgrading Page Forms, since as far as I know the latest version of Page Forms works with both MW 1.35 and PHP8 (certainly, it works better with PHP8 better than version 4.9 of Page Forms did). Anyway, I hope you're able to upgrade Page Forms at some point. For now, I'm closing this issue again, since as far as I know it's been fixed. As before, feel free to re-open it.