Page MenuHomePhabricator

Module:Wikidata label should use mw.wikibase.label and mw.wikibase.sitelink
Closed, ResolvedPublic

Description

Module:Wikidata label should use mw.wikibase.label and mw.wikibase.sitelink whenever possible instead of loading the whole entity.

This is faster (no need to push the whole entity structure to Lua) and it allows more narrow usage tracking, which will make it possible to be more informed about what data exactly is used on a given page.

Event Timeline

hoo created this task.Aug 12 2017, 4:16 PM
Restricted Application added a subscriber: PokestarFan. · View Herald TranscriptAug 12 2017, 4:16 PM
daniel added a subscriber: daniel.

Some time ago @Jarekt build https://commons.wikimedia.org/wiki/Module:Wikidata_label and it's awesome! I had a look at it with @hoo and he pointed out that the module could probably be optimized using new functionality to not grab the whole entity, but only the needed parts.

The module is used on almost every file page so making it more efficient is probably worth it.

Just talked with @daniel about this, we have to figure out fallback. Filed T173207 for that part

See also T171392. When using Wikidata label in large quantities together with Langswitch, OOM.

I will look into it, but the issue is that mw.wikibase.label and mw.wikibase.sitelink functions do not specify input language parameter or have capability to return all labels or sitelinks. Module:Wikidata_label module take language as an input parameter. mw.wikibase.getLabelWithLang is a little better because I know what language is returned and if it matches what is requested than I can take it, but there is no equivalent function for sitelinks. mw.wikibase.sitelink is not documented well so it is unclear what will be returned on Commons, so I will have to do some experiments with it to figure out if I can use it.

hoo added a comment.Aug 13 2017, 3:30 PM

mw.wikibase.sitelink returns the page title of an Item on the current wiki. Sadly we currently don't have an alternative to entity:getSitelink which doesn't require loading the whole entity. Adding this is tracked in T142903: Get sitelink from Item without loading the full Item (via Lua). for that.

Optimal solution for me would be to modify mw.wikibase.label to take optional second parameter wikibase.label( id, langCode ) , which would be the same as used by entity:getLabel. Another approach could be to create a new function, perhaps mw.wikibase.labels, which returns a table with all the labels. For Q79822 it would be { ["pl"]="Adam Mickiewicz", ["ru"]="Адам Мицкевич", ...} .

Jarekt claimed this task.Aug 16 2017, 11:40 AM
Jarekt closed this task as Resolved.Oct 12 2017, 1:14 PM

I deployed new Module:Wikidata_label code and verified that a page showing "Berlin" label does not load the whole entity to display it.

Similar task needs to be done to Module:Authority_control which I will also do.

hoo added a comment.EditedOct 12 2017, 2:01 PM

I deployed new Module:Wikidata_label code and verified that a page showing "Berlin" label does not load the whole entity to display it.
Similar task needs to be done to Module:Authority_control which I will also do.

	-- hard way to get label
	if not label then
		entity = entity or mw.wikibase.getEntity(item); 
		for i, language in ipairs(langList) do 
			label = entity:getLabel(language)
			if label then break end -- label found and we are done
		end
	end

Why is that needed? This is very bad, as it will add all usages (and later on usages for labels on a per language basis) without need… can we avoid this? Why doesn't mw.wikibase.getLabelWithLang suffice? It applies fallbacks.

The input to the template:label is q-code and the language. Language is optional and the code you highlighted will not be used if the language of the user matches the language returned by mw.wikibase.getLabelWithLang, which supposed to happen if language is not provided to the template. So that code would only be used in a rare case someone requests label in some specific language different than what is returned by mw.wikibase.getLabelWithLang.

hoo added a comment.Oct 12 2017, 2:38 PM

The input to the template:label is q-code and the language. Language is optional and the code you highlighted will not be used if the language of the user matches the language returned by mw.wikibase.getLabelWithLang, which supposed to happen if language is not provided to the template. So that code would only be used in a rare case someone requests label in some specific language different than what is returned by mw.wikibase.getLabelWithLang.

Ah, I see… that's good. But in that case you probably want the code to read:

	-- hard way to get label
	if not label and $LangExplicitlyGiven$ then
		entity = entity or mw.wikibase.getEntity(item)
		label = entity:getLabel(lang)
	end
  1. No need to do this if lang was not explicitly asked for. 2. No need to go through the whole list again.

Ok, lets look at the whole code for fetching label:

	-- try easy way to get label (visible part of the link)
	if not entity then
		label, language = mw.wikibase.getLabelWithLang( item )
		if lang~=language then
			label = nil -- does not match desired language so lets nuke it
		end
	end
	
	-- hard way to get label
	if not label then
		entity = entity or mw.wikibase.getEntity(item); 
		for i, language in ipairs(langList) do 
			label = entity:getLabel(language)
			if label then break end -- label found and we are done
		end
	end
	if not label then -- no labels found so just show the q-code
		label = item
	end
  1. The function can be called either with item (from templates) or with entity (possibly from other lua codes). If we already have entity than we skip the 1st IF block and go directly to 2nd one as there is no need to call Wikidata again.
  2. In the 2nd block we have the loop over languages in the fallback list so if you ask for "zh" but we do not have it than you will get "en", or other label in the language on the fallback list. I assume that is equivalent to what you would find in getLabelWithLang
  3. If everything else fails the label will be the q-code
  4. 1st IF block can be done better as instead of lang~=language we should check if language is on the langList fallback list. That should result in fewer cases going to the 2nd IF block. I will fix that.

If there are other improvements one can make, please let me know.

hoo added a comment.Oct 19 2017, 9:18 AM

Well, entity:getLabel should not be looped with an explicit language given if you want to get a language in the user's language. If you manually construct the fallback chain we on our end can't detect that you actually wanted the user language… also this is not needed as getLabel() already takes fallbacks into account (for the user's language only).

Also you still execute the second block, even if there's not going to be a label, this is in case langList contains just the user's language fallback chain. That might unnecessarily load the whole entity.

I am testing new code in Module:Wikidata label/sandbox. The label selection code becomes

	userLang = mw.getCurrentFrame():callParserFunction( "int", "lang" )
	
	-- try easy way to get label (visible part of the link)
	-- call if requesting label in user's language, but skip if we already have entity
	if (userLang==lang) and (not entity) then
		label, language = mw.wikibase.getLabelWithLang( item )
	end

	-- hard way to get label
	-- used if requesting label in language different than user's, or if we already have entity
	if not label then
		entity = entity or mw.wikibase.getEntity(item) -- load entity if we do not have it yet
		for _, language in ipairs(langList) do -- loop over language fallback list
			label = entity:getLabel(language)
			if label then break end -- label found and we are done
		end
	end
	if not label then -- no labels found so just show the q-code
		label = item
	end

The language loop over getLabel() in the second block will be executed only in a rare cases when one of following things happen:

  1. someone called the template while specifying language other than user's language. For example see Template:City Examples where name of Beijing is displayed in English, Chinese and in Arabic. That should be a rare case mostly used for template testing
  2. We already fetched the whole entity for something else and there is no need for another call to get the label
  3. getLabelWithLang returned nul label

Would that fix the code in your view?

If getLabelWithLang was modified to take second language input parameter we could skip if (userLang==lang) test and let getLabelWithLang handle first case as well.

I assume that the second block is more efficient if we already have entity and are just parsing it, as oppose to fetching data again from wikidata. But if it does not make much of a difference, with second parameter to getLabelWithLang we could get rid of the second block and rely only on that function.

As reported here, something is not working. Try this code in the Lua console:

= require('Module:Wikidata label')._getLabel('Q123', 'en')
= require('Module:Wikidata label')._getLabel('Q321', 'en')

You will obtain:

[[w:en:September|September]]
[[w:en:September|Milky Way]]

Instead of:

[[w:en:September|September]]
[[w:en:Milky Way|Milky Way]]

It seems a bug on a sort of static cache on the wililink.

Lydia_Pintscher reopened this task as Open.Oct 22 2017, 8:43 PM

I restored earlier version of the module code that did not use mw.wikibase.sitelink function. Valerio.bozzolan test can still be tried on the /Sandbox version of the module. See screen shot below. It seems to me that mw.wikibase.getLabelWithLang function worked fine, and only the sitelink come from an item that the call should not know anything about.

I tried :

= require('Module:Test')._enSitelink('Q123')
September
= require('Module:Test')._enSitelink('Q321')
Milky Way

Where

function p._enSitelink(item)
	return mw.wikibase.sitelink( item, 'enwiki' )
end

and it seems to work fine. It would be nice to reproduce the issue using much simpler code than require('Module:Wikidata label')._getLabel

valerio.bozzolan added a comment.EditedOct 23 2017, 9:24 PM

I'm confused. Very confused.

From this diff I've added these lines to debug the value of item:

label = 'Normal: ' .. item
sitelink = 'Absurd: ' .. item

Yes, there are no other lines of code between these two lines.

Now:

=require('Module:Wikidata label/sandbox')._getLabel('Q123', 'en')
-- Output:
-- [[w:en:Absurd: Q123September|Normal: Q123]]

That's normal. But running suddently then:

=require('Module:Wikidata label/sandbox')._getLabel('Q321', 'en')
-- Output:
-- [[w:en:Absurd: Q123September|Normal: Q321]]

W-T-F - I'm - seeing.

Too Internet for today. Good night guys.

hoo closed this task as Resolved.Oct 23 2017, 10:23 PM

I'm confused. Very confused.
From this diff I've added these lines to debug the value of item:

label = 'Normal: ' .. item
sitelink = 'Absurd: ' .. item

Yes, there are no other lines of code between these two lines.
Now:

=require('Module:Wikidata label/sandbox')._getLabel('Q123', 'en')
-- Output:
-- [[w:en:Absurd: Q123September|Normal: Q123]]

That's normal. But running suddently then:

=require('Module:Wikidata label/sandbox')._getLabel('Q321', 'en')
-- Output:
-- [[w:en:Absurd: Q123September|Normal: Q321]]

W-T-F - I'm - seeing.
Too Internet for today. Good night guys.

This actually took me a while to figure: The variable link in your function is not local, thus leaking state across function calls, that's why this behaves so weirdly. Declare it local and it should all be sane again.

valerio.bozzolan added a comment.EditedOct 24 2017, 10:14 AM

That have sense, but...

Anyway I don't see how that could explain the fact that, in the second call, the debugged value of item is not the same in both my debugged fields.

I mean, that value can be erroneus, or correct, but not both correct and erroneous in the same line of code.

hoo added a comment.Oct 24 2017, 12:50 PM

That have sense, but...
Anyway I don't see how that could explain the fact that, in the second call, the debugged value of item is not the same in both my debugged fields.
I mean, that value can be erroneus, or correct, but not both correct and erroneous in the same line of code.

Well, the sitelink = 'Absurd: ' .. item declaration is ineffective in that case, as it it's not taken into account if link is already set (if not link then-- apply default "Wikipedia" link type).

This actually took me a while to figure: The variable link in your function is not local, thus leaking state across function calls, that's why this behaves so weirdly. Declare it local and it should all be sane again.

@hoo, Thanks for figuring it out. I was equally confused as @valerio.bozzolan and your explanation brings sanity into this code. I checked and declared all variables, tested the code and indeed it worked fine. The new version is live again.

valerio.bozzolan added a comment.EditedOct 24 2017, 8:49 PM

I still don't find consistence between the fix and mine old debugged sandbox patch, that I re-report:

-- ...
label    = 'Normal: ' .. item
sitelink = 'Absurd: ' .. item
-- ...

Both these assignments read from the same stack state but apparently both receive different values of item. The first receives Q321 and the second Q123 as reported in the output [[w:en:Absurd: Q123September|Normal: Q321]].

Ghosts.

hoo added a comment.Oct 25 2017, 8:37 AM

I still don't find consistence between the fix and mine old debugged sandbox patch, that I re-report:

-- ...
label    = 'Normal: ' .. item
sitelink = 'Absurd: ' .. item
-- ...

Both these assignments read from the same stack state but apparently both receive different values of item. The first receives Q321 and the second Q123 as reported in the output [[w:en:Absurd: Q123September|Normal: Q321]].
Ghosts.

Alright, I reduced this test case:

p = {}

function p._getLabel(item, lang, link_type, capitalization)
	local entity, langList, s, sitelink, label, dLink, userLang

	-- Lua nonsense?
	label = 'Normal: ' .. item
	sitelink = 'Absurd: ' .. item

	if not link then-- apply default "Wikipedia" link type
		sitelink = sitelink .. 'TheSiteLink' -- "DEBUG"
		link = mw.ustring.format('w:%s:%s', lang, sitelink)
	end

	-- return the results
	if link~='' then
		return mw.ustring.format('[[%s|%s]]', link, label) -- return link
	else
		return label -- return just a label
	end
end

return p

Yields:

= p._getLabel( 'Q2', 'de' )
[[w:de:Absurd: Q2TheSiteLink|Normal: Q2]]
= p._getLabel( 'Q3', 'de' )
[[w:de:Absurd: Q2TheSiteLink|Normal: Q3]]

As explained above, this happens as (after the first run) link is set globally, thus the first if is not executed, leading to the old value (from the first function call) being placed in link, instead of the current/ new value of sitelink.

Oh yeah! I'm in. Thanks :)