Page MenuHomePhabricator

Many Wikipedia's Wikidata module iterate over all entity claims if a Statement is searched for by property label
Closed, ResolvedPublic

Description

Yesterday I tried to enable Statement usage tracking on cawiki (which means we're exactly tracking which Statement has been used, and not just that "all entity data" is used). When doing this I discovered that many many usages on cawiki are needlessly added due to a performance bug in their Mòdul:Wikidata (https://ca.wikipedia.org/w/index.php?title=M%C3%B2dul_Discussi%C3%B3:Wikidata&oldid=18938979#Critical_performance_improvement).

The problematic code

		-- otherwise, iterate over all properties, fetch their labels and compare this to the given property name
		for k, v in pairs(entity.claims) do
			if mw.wikibase.label(k) == property then return v end
		end
		return

can easily be replaced with

		property = mw.wikibase.resolvePropertyId(property)
		if not property then return end

		return entity.claims[property]

The problematic code is also on several other wikis: P6114

Event Timeline

  • Easy/workaround - When code access pairs of pseudo table (entity.claims as here and also entity.labels and entity.descriptions once T172914 is getting merged ) we should probably workaround it upstream either from UsageAggregator (T178079) or from Lua (whenever access pairs for entity.claims, count it as C.* instead of many rows).
    • wbc_entity_usage will not get overloaded with too many rows. This is just workaround to avoid unintentional usage, as it still make the EU not efficient (from rc side)
  • Medium - Le Tour de Wikí 2017 go over all wikis and fix them (there is no central fix - T121470 T41610)

Change 383990 had a related patch set uploaded (by Eranroz; owner: Eranroz):
[mediawiki/extensions/Wikibase@master] Access to property by name

https://gerrit.wikimedia.org/r/383990

Change 383990 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] getAllStatements and access to property by name

https://gerrit.wikimedia.org/r/383990

This will go live this week, Would be good to measure the impact.

This will go live this week, Would be good to measure the impact.

The above patch (383990) doesn't have any impact. wikis have to adopt it, and this is just a convenient method to avoid improper property usage

Ladsgroup lowered the priority of this task from High to Low.Feb 2 2018, 2:59 AM

Given that we built T185693: Implement a (more liberal) usage aspect deduplicater (days: 3) it can't blow up the database anymore.

I think this task can be considered as resolved (mainly thanks to T185693) from engineering POV.
If there are still wikis using such iterations - they should fix it, but it is now on their (community) hands.
I suggest that before moving this task to resolved, we should ask Community-Relations-Support to communicate to wikis to be aware to it.

Anyway, we should keep an open eyes on wikis with abnormal usage tracking - this may be more fruitful and surely more fun to do with ML/outliers detection based on the data (wbc_entity_usage) rather than greping the code :)

BTW I fixed it in the following diffs for some wikis:

MW template:
https://www.mediawiki.org/w/index.php?title=Module:Wikidata&diff=2728318&oldid=2412211

Wikipedias and other projects (not all projects...):
https://sq.wikipedia.org/w/index.php?title=Moduli:Wikidata&diff=1844990&oldid=1757199
https://ps.wikipedia.org/w/index.php?title=Module:Wikidata&diff=prev&oldid=217026
https://ne.wikipedia.org/w/index.php?title=%E0%A4%AE%E0%A5%8B%E0%A4%A1%E0%A5%8D%E0%A4%AF%E0%A5%81%E0%A4%B2:Wikidata&diff=prev&oldid=633773
https://mn.wikipedia.org/w/index.php?title=Module:Wikidata&diff=prev&oldid=538250
https://mk.wikipedia.org/w/index.php?title=%D0%9C%D0%BE%D0%B4%D1%83%D0%BB:Wikidata&diff=prev&oldid=3689944
https://lv.wikipedia.org/w/index.php?title=Modulis:Wikidata&diff=prev&oldid=2816246
https://lt.wikipedia.org/w/index.php?title=Module:Wikidata&diff=prev&oldid=5336632
https://pt.wikivoyage.org/w/index.php?title=M%C3%B3dulo:Wikidata&diff=prev&oldid=123281
https://hi.wikipedia.org/w/index.php?title=Module:Wikidata&diff=prev&oldid=3728109
https://en.wiktionary.org/w/index.php?title=Module:wikidata&diff=prev&oldid=49084837
https://bs.wikipedia.org/w/index.php?title=Modul:Wikidata&diff=prev&oldid=2914902

I wrote an script and fixed all of them in one go except the ones that were protected.

ladsgroup@terbium:~$ mwgrep --module "otherwise, iterate over all properties, fetch their labels and compare this to the given property name" --max-results 200
arwiki              Module:Wikidata
bgwiki              Module:Wikidata
iawiki              Module:Wikidata
idwiki              Module:Wikidata
incubatorwiki       Module:Wp/lrc/Wikidata
kuwiki              Module:Wikidata
mywiki              Module:Wikidata
ptwiki              Module:Wikidata
thwiki              Module:Wikidata
viwiki              Module:Wikidata
zhwiki              Module:Wikidata
zhwiki              Module:Wikidata2
zhwiki              Module:沙盒/Dabao qian/Wikidata
zhwiki              Module:沙盒/PhiLiP/Wikidata

I think getting this done is good because they would get only related notifications instead of getting it for every edit that happens on the statements.

Ladsgroup moved this task from Blocked to Done on the Wikidata-Ministry-Of-Magic board.

Only six wikis had this problem: ia, pt, my, bg, ku, th. I sent a message in talk page of them and ptwiki, iawiki and mywiki got fixed now \o/ Only three small wikis left now.