Page MenuHomePhabricator

Syntax highlighting in WDQS for non-latin script
Closed, ResolvedPublic

Description

Hi,

I noticed that the syntax highlighting in the Wikidata Query Service doesn't work for non-latin scripts.

For instance, if I took the first cat example and change ?item to ?Элементы (see the Query here), the color goes to black instead of the usual blue.

Event Timeline

VIGNERON created this task.Jul 26 2017, 7:09 AM
Restricted Application added subscribers: PokestarFan, Aklapper. · View Herald TranscriptJul 26 2017, 7:09 AM
Restricted Application added a project: Discovery. · View Herald TranscriptJul 26 2017, 2:16 PM

CodeMirror uses the RegExp /^[\w\d]*/ to match variable names, which is equivalent to /[A-Za-z0-9_]*/ and doesn’t include any other Unicode letters. Unfortunately, JavaScript regular expressions don’t have any sort of Unicode categories like \p{L} (even ES6’s u modifier doesn’t add them, though it at least makes \p an error so that a later ES version can define it without breaking compatibility), so there’s no straightforward way to fix this, except by compiling a list of all Unicode letter ranges and hard-coding that into the RegExp.

Jonas closed this task as Declined.Jul 31 2017, 9:39 AM
Jonas added a subscriber: Jonas.

Sorry for the inconvenience!

@Jonas - is there not a pattern that we could use instead of /^[\w\d]*/ ... we have a questionmark followed by not white space followed by a space or by ) or } or .

My regex is not great, but \?(\S*?)[\s\)}\.]

Amire80 added a subscriber: Amire80.

Why Declined?

The problem is real.

Perhaps it can be tagged as "Upstream" or "Stalled" or something.

Amire80 reopened this task as Open.Dec 7 2018, 8:28 PM
Smalyshev changed the task status from Open to Stalled.Dec 7 2018, 8:35 PM
Smalyshev edited projects, added Wikidata Query UI; removed Wikidata-Query-Service.

It seems to be fixed, isn't it?

Lucas_Werkmeister_WMDE closed this task as Resolved.Jul 31 2019, 10:36 AM

Indeed – this seems to have been fixed in codemirror/CodeMirror#5936 (using the approach I mentioned above: hard-code the ranges of all Unicode letters into the regexp), which was released in 5.48.2 12 days ago, and I guess we installed that version?

(Side note, we should really start committing package-lock.json in the GUI (T179229), so that package updates only happen when we explicitly do them in the repository. In this case the automatic update on npm install is nice, but this could also introduce any number of security vulnerabilities…)