Fix Unicode string comparison
AcceptedPublic

Authored by Lucas_Werkmeister_WMDE on Aug 29 2018, 10:43 AM.

Details

Reviewers
Magnus
Patch without arc
git checkout -b D1105 && curl -L https://phabricator.wikimedia.org/D1105?download=true | git apply
Summary

Canonical Decomposition is not the form of strings used by Wikibase (Wikibase uses Canonical Composition), and it is not usually the form in which user input is received (some browsers explicitly normalize inputs to Canonical Composition, and other input is also usually in a normalized form), so if we only Decompose one half of the values when comparing them, we will usually end up reporting a mismatch. This manifests itself as an error (“base statement not found”) when attempting to add a statement with qualifiers whose value contains umlauts: when attempting to add the qualifiers, QuickStatements will be unable to locate the statement it just added.

The fix for this is simple: normalize both sides when comparing strings. In that case, it also doesn’t matter which normalization form we use.

Diff Detail

Repository
R2010 tool-quickstatements
Lint
Lint Skipped
Unit
Unit Tests Skipped
Lucas_Werkmeister_WMDE requested review of this revision.EditedAug 29 2018, 10:43 AM
Lucas_Werkmeister_WMDE created this revision.
Lucas_Werkmeister_WMDE edited the summary of this revision. (Show Details)

Here’s the same commit as a patch you can paste directly into git am:

1From 6f6b3533ad4c8ec6d1b74186d2d840a8502bfb2c Mon Sep 17 00:00:00 2001
2From: Lucas Werkmeister <mail@lucaswerkmeister.de>
3Date: Wed, 29 Aug 2018 12:32:33 +0200
4Subject: [PATCH] Fix Unicode string comparison
5MIME-Version: 1.0
6Content-Type: text/plain; charset=UTF-8
7Content-Transfer-Encoding: 8bit
8
9Canonical Decomposition is not the form of strings used by Wikibase
10(Wikibase uses Canonical Composition), and it is not usually the form in
11which user input is received (some browsers [1] explicitly normalize
12inputs to Canonical Composition, and other input is also usually in a
13normalized form), so if we only Decompose one half of the values when
14comparing them, we will usually end up reporting a mismatch. This
15manifests itself as an error (“base statement not found”) when
16attempting to add a statement with qualifiers whose value contains
17umlauts: when attempting to add the qualifiers, QuickStatements will be
18unable to locate the statement it just added.
19
20The fix for this is simple: normalize both sides when comparing strings.
21In that case, it also doesn’t matter which normalization form we use.
22
23[1]: https://stackoverflow.com/a/11190012
24---
25 public_html/quickstatements.php | 4 +++-
26 1 file changed, 3 insertions(+), 1 deletion(-)
27
28diff --git a/public_html/quickstatements.php b/public_html/quickstatements.php
29index 5e40e6e..2471cea 100644
30--- a/public_html/quickstatements.php
31+++ b/public_html/quickstatements.php
32@@ -483,7 +483,9 @@ protected function getStatementID ( $command ) {
33 protected function compareDatavalue ( $d1 , $d2 ) {
34 if ( $d1->type != $d2->type ) return false ;
35 if ( $d1->type == 'string' ) {
36- return normalizer_normalize($d1->value,Normalizer::FORM_D) == $d2->value; # Yay Unicode!
37+ $value1 = normalizer_normalize($d1->value,Normalizer::FORM_D);
38+ $value2 = normalizer_normalize($d2->value,Normalizer::FORM_D);
39+ return $value1 == $value2;
40 }
41 if ( $d1->type == 'quantity' ) return $d1->value->amount*1 == $d2->value->amount*1 ;
42 if ( $d1->type == 'time' ) {
43--
442.17.1
45

Lucas_Werkmeister_WMDE edited the summary of this revision. (Show Details)Aug 29 2018, 10:45 AM
Magnus accepted this revision.Aug 29 2018, 10:50 AM

Applied!

This revision is now accepted and ready to land.Aug 29 2018, 10:50 AM