Page MenuHomePhabricator

Entities returned by mw.wikibase.getEntity lua function differ based on language of the viewer
Closed, InvalidPublic

Description

mw.wikibase.getEntity lua function, activated by T223792 to work with Structured Data on Commons (SDC) returns SDC entity. The best way to view it is by mw.dumpObject function. The output of mw.dumpObject|mw.dumpObject (with collapsed "statements" table) for File:Indoor_Climbing_Kid.jpg (M4184419) is

`table#1 {

metatable = table#2
["id"] = "M4184419",
["labels"] = table#3 {
  metatable = table#4
  ["en"] = table#5 {
    ["language"] = "en",
    ["value"] = "A five year old hanging around bouldering wall in Sportrock climbing gym in Alexandria, Virginia, USA",
  },
},
["schemaVersion"] = 2,
["statements"] = table#6 { ... },
["type"] = "mediainfo",

}`

which is correct as the file has only English caption. That changes when I switch my language from English to Polish and than I get

`table#1 {

metatable = table#2
["id"] = "M4184419",
["labels"] = table#3 {
  metatable = table#4
  ["en"] = table#5 {
    ["language"] = "en",
    ["value"] = "A five year old hanging around bouldering wall in Sportrock climbing gym in Alexandria, Virginia, USA",
  },
  ["pl"] = table#6 {
    ["language"] = "en",
    ["value"] = "A five year old hanging around bouldering wall in Sportrock climbing gym in Alexandria, Virginia, USA",
  },
},
["schemaVersion"] = 2,
["statements"] = table#7 { ... },
["type"] = "mediainfo",

}`

The entity returned should not depend on user's language.

COVID-19 Deployment Criteria

  • Can you roll back this change without lasting impact?
    1. A recovery plan is required as this will help identify our capacity for recovering from the failure
    2. THIS IS A KEY QUESTION, if you can’t answer it, you shouldn’t deploy
  • Is specialized knowledge required to support this change in production? If so, are there multiple people with this knowledge?
  • Is there a way to increase confidence about the correctness of this change?
    1. Reviews (Design, Code, etc)
    2. Testing coverage (unit tests, integration tests)
    3. Manual testing (e.g. Beta, vagrant, docker)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I am able to reproduce this on Commons, but not with my local setup.
There are a few other related patches coming up soon-ish, and I'll pick this back up once those are settled.

Based on our conversation in T231952, a couple of files on Commons that I suspect have been affected by this bug are:

File:Póvoa de Varzim -i---i- (25379025808).jpg : has a caption in English, while wbc_entity_usage indicates usage of both an English and a Portuguese caption:

SELECT *
    -> FROM wbc_entity_usage
    -> WHERE eu_page_id = 68860692;
+------------+--------------+-----------+------------+
| eu_row_id  | eu_entity_id | eu_aspect | eu_page_id |
+------------+--------------+-----------+------------+
| 2051749177 | M68860692    | L.en      |   68860692 |
| 2055570857 | M68860692    | L.pt      |   68860692 |
+------------+--------------+-----------+------------+
2 rows in set (0.00 sec)

File:Bolide.jpg : has captions in English and Chinese, while wbc_entity_usage indicates 31 languages:

SELECT *
    -> FROM wbc_entity_usage
    -> WHERE eu_page_id = 10184478;
+------------+--------------+-------------+------------+
| eu_row_id  | eu_entity_id | eu_aspect   | eu_page_id |
+------------+--------------+-------------+------------+
| 2037902202 | M10184478    | L.fr        |   10184478 |
| 2059352313 | M10184478    | L.tr        |   10184478 |
| 2063734518 | M10184478    | L.be-tarask |   10184478 |
| 2064893164 | M10184478    | L.ky        |   10184478 |
| 2065176878 | M10184478    | L.en        |   10184478 |
| 2067319129 | M10184478    | L.zh-hk     |   10184478 |
| 2069482808 | M10184478    | L.ru        |   10184478 |
| 2070527405 | M10184478    | L.hr        |   10184478 |
| 2074289864 | M10184478    | L.az        |   10184478 |
| 2074787102 | M10184478    | L.uk        |   10184478 |
| 2079819984 | M10184478    | L.fa        |   10184478 |
| 2079915614 | M10184478    | L.zh        |   10184478 |
| 2082093218 | M10184478    | L.et        |   10184478 |
| 2083792222 | M10184478    | L.zh-tw     |   10184478 |
| 2085056560 | M10184478    | L.ar        |   10184478 |
| 2085263594 | M10184478    | L.lv        |   10184478 |
| 2091202957 | M10184478    | L.pl        |   10184478 |
| 2097044668 | M10184478    | L.fi        |   10184478 |
| 2097635187 | M10184478    | L.cs        |   10184478 |
| 2098189412 | M10184478    | L.ta        |   10184478 |
| 2100409677 | M10184478    | L.mk        |   10184478 |
| 2102465127 | M10184478    | L.sr        |   10184478 |
| 2105562109 | M10184478    | L.sh        |   10184478 |
| 2106360150 | M10184478    | L.ko        |   10184478 |
| 2110227952 | M10184478    | L.kk        |   10184478 |
| 2116158523 | M10184478    | L.nl        |   10184478 |
| 2130720957 | M10184478    | L.be        |   10184478 |
| 2131162549 | M10184478    | L.af        |   10184478 |
| 2142902515 | M10184478    | L.zh-cn     |   10184478 |
| 2143168287 | M10184478    | L.vi        |   10184478 |
| 2149523316 | M10184478    | L.de        |   10184478 |
+------------+--------------+-------------+------------+
31 rows in set (0.00 sec)
matthiasmullie renamed this task from Structured Data on Commons entities returned by mw.wikibase.getEntity lua function differ based on language of the viewer to Entities returned by mw.wikibase.getEntity lua function differ based on language of the viewer.Mar 26 2020, 2:32 PM

Apologies it took awhile to get back to this! Unfortunately, this has not resolved itself over time :)

This is not MediaInfo/Commons-specific, this also happens on Wikidata in a similar scenario. Easiest way to check: run below line on the Lua debug console (e.g. on https://www.wikidata.org/w/index.php?title=Module:Sandbox&action=edit) with your language set to anything other than English (uselang won't work, though - actually change your display language)
mw.logObject( mw.wikibase.getEntity( 'Q6267810' ) )

You'll find that the output includes this:

["labels"] = table#10 {
  metatable = table#11
  ["en"] = table#12 {
    ["language"] = "en",
    ["value"] = "Johnny Turbo",
  },
  ["pl"] = table#12 {
    ["language"] = "en",
    ["value"] = "Johnny Turbo",
  },
},

There is a pl entry (or whatever non-English language you selected), even though the entity doesn't have a label in that language.

This happens inside ClientEntitySerializer:serialize(): the original entity $serialization is just fine, but it looks like the fallback chain handling then adds the other (pl in this case) entry.
Looking at the code, this is probably intentional behavior, and one could verify languages with the language key included for each label entry.
I'm not 100% certain it should be like this, though - i.e. it also seems to affect wbc_entity_usage results (T238484#5738295).

I'll defer to Wikidata team about what (if anything) should happen here.

I never run into this when processing Wikidata entities, so it is interesting that it happens there too. In my codes, I began to test that two languages match and ignore the rest, but if this is a "feature" not a "bug" I can probably simplify some of those codes.

So, does anyone know whether this is a feature or a bug? This behavior appears to be intentional and not causing any trouble when testing whether or not languages match, in which case I believe we can close this ticket?

This behavior appears to be intentional and not causing any trouble when testing whether or not languages match, in which case I believe we can close this ticket?

Agree

Apologies @matthiasmullie and @Jarekt for embarrassingly slow response on this one. Somehow our change of pings and responses got lost here at WMDE.
The behaviour described is how it is intended to be (i.e. a feature, not a bug).
We do see, in retrospect, that it is a bit odd/not the most obvious way, but for the time being it is the expected behaviour.

@WMDE-leszek, Is there documentation for this feature? I would like to understand it better.