Page MenuHomePhabricator

[6hr.] Investigation - Wikibase Lua: addStatementUsage seems unjustifiably expensive
Closed, ResolvedPublic

Description

As seen in the report below it appears as Scribunto_LuaSandboxCallback::addStatementUsage is taking a considerable amount of computation time lately.

A 6 Hour Timebox is required to investigate:

  1. What is causing this function to take longer lately?
  2. Can we continuously monitor this to be alerted to similar changes in the future? (Optional, can be potentially separated into a different task)

Initial Report:

The computing times increased drastically by about 100 %, mainly produced by a drastic increase of the computing time of the Lua function addStatementUsage (increase of about 500 %).

From the Lua profile of https://de.wikivoyage.org/wiki/Halle_(Saale) (see html source code):

Lua Profile:
    Scribunto_LuaSandboxCallback::addStatementUsage                 4540 ms       55.0%
    Scribunto_LuaSandboxCallback::callParserFunction                1640 ms       19.9%
    Scribunto_LuaSandboxCallback::getEntity                          540 ms        6.5%
    Scribunto_LuaSandboxCallback::addSiteLinksUsage                  420 ms        5.1%
    Scribunto_LuaSandboxCallback::getEntityStatements                320 ms        3.9%
    init <Modul:CountryData/Geography>                               100 ms        1.2%
    recursiveClone <mwInit.lua:41>                                   100 ms        1.2%
    ?                                                                100 ms        1.2%
    Scribunto_LuaSandboxCallback::gsub                                80 ms        1.0%
    init <Modul:VCard/Subtypes>                                       80 ms        1.0%
    [others]                                                         340 ms        4.1%

Maybe it is the same like T236485.

Event Timeline

Thank you, Roland.

Some questions:

  • Was this a sudden spike or a slow increase?
  • When did you see the change?
  • Is this only happening on this article or also others?

Some initial investigation I've done:

It seems that it is a sudden spike. The computing time only for addStatementUsage jumped from about 600 ms to 5000 ms (all Lua stuff from about 4.5 sec to about 9 sec). Within the last two weeks I added and optimized the analysis of opening hours to our vCard template. As in the past I used the same technique to prevent retrieving data from Wikidata. We are using several tables like this: https://de.wikivoyage.org/wiki/Modul:Hours/i18n with prepared data. On this way there is usually only one getEntity call per template. The number of entities used increased within the last two weeks only by about 1. Of course I added some statements to the Wikidata datasets (mainly of museums) to test the display of opening hours. Within the last two weeks there was no significant change. The core functions were added already on Sep. 26, 2020.

There is another article with the same behavior: https://de.wikivoyage.org/wiki/Eisenbahnmuseen_in_Europa. For this article, I got today in the morning a runtime error that there was not enough time.

Lua Profile:
    Scribunto_LuaSandboxCallback::addStatementUsage                 5580 ms       68.7%
    Scribunto_LuaSandboxCallback::addSiteLinksUsage                  800 ms        9.9%
    Scribunto_LuaSandboxCallback::callParserFunction                 720 ms        8.9%
    Scribunto_LuaSandboxCallback::getEntity                          320 ms        3.9%
    Scribunto_LuaSandboxCallback::addLabelUsage                      120 ms        1.5%
    Scribunto_LuaSandboxCallback::incrementStatsKey                  120 ms        1.5%
    Scribunto_LuaSandboxCallback::fullUrl                             80 ms        1.0%
    Scribunto_LuaSandboxCallback::gsub                                40 ms        0.5%
    init <Modul:CountryData/Geography>                                40 ms        0.5%
    ?                                                                 40 ms        0.5%
    [others]

I think there is a cache problem (see also T236749). And I assume that we exceeded a threshold with only minor additions. As already stated I think this bug is the same like T236485. The caching bug was not really solved but only downplayed. Therefore it was only a question of time that this bug reoccured.

There are only 166 entity calls (one per template) in the article of Halle (Saale) which is clearly less than the limit of 400. And I hope that we can use up to 250 Wikidata calls in future. Of course we use a lot of properties (in article mentioned about 50, maximally about 100) but we do not retrieve additional ones because of the tables mentioned. (I remember that you stated that Wikivoyage is the Wikimedia site which uses Wikidata in most extreme extent -- more than Commons. That is really true.)

I tried to remove the museum section from https://de.wikivoyage.org/wiki/Halle_(Saale) and to stop using qualifiers and references from entities but the bug remained.

But it seems that this bug depends on the number of entities loaded:

A week ago, the addStatementUsage computing time for the last case was about 600 ms (less than 10 %).

All statements are retrieved with entity:getBestStatements( p ) function in Module:Wikidata_utilities. The article entity is untouched.

If the bug is caused by cache limitations: Maybe a simple destructor entity:remove() would help. People told me that all entities are cached until the end of article parsing. But most of the entities are used in only one template and must not be cached any longer.

Mike_Peel and RexxS might be other people to ask if they've seen similar changes in their Lua usage.

@toan suggested getting old profiling reports from the Internet Archive; they have one Halle (Salle) snapshot from August 2020, where the Lua profile looks like this:

Lua Profile:
    Scribunto_LuaSandboxCallback::callParserFunction                2120 ms       41.7%
    Scribunto_LuaSandboxCallback::getEntity                          800 ms       15.7%
    Scribunto_LuaSandboxCallback::addStatementUsage                  580 ms       11.4%
    Scribunto_LuaSandboxCallback::getEntityStatements                360 ms        7.1%
    ?                                                                180 ms        3.5%
    Scribunto_LuaSandboxCallback::gsub                               140 ms        2.8%
    recursiveClone <mwInit.lua:41>                                   120 ms        2.4%
    init <Modul:VCard/Subtypes>                                       80 ms        1.6%
    Scribunto_LuaSandboxCallback::addSiteLinksUsage                   60 ms        1.2%
    (for generator)                                                   60 ms        1.2%
    [others]                                                         580 ms       11.4%
Number of Wikibase entities loaded: 157/400

Halle (Saale) currently has some seven hundred statement usages:

MariaDB [dewikivoyage]> SELECT COUNT(*) FROM wbc_entity_usage WHERE eu_page_id = (SELECT page_id FROM page WHERE page_namespace = 0 AND page_title = 'Halle_(Saale)') AND eu_aspect LIKE 'C%';
+----------+
| COUNT(*) |
+----------+
|      702 |
+----------+

If each of those corresponds to one addStatementUsage call, then they only cost ca. 6 ms per call.

The Wikivoyage template vCard can use up to 50 statements per template (see table ParWD at Module:VCard/Params, too). About 5 addStatementUsage calls per template on average are not really much even for an entity which is already loaded (loading takes only about 2.5 ms). Within the last weeks we added only one statement (P3025: open days). Therefore, the number of addStatementUsage calls increased only by a few percent.

Some days ago a single addStatementUsage call costs only about 1 ms but now 6 ms . The question arose: why this drastic change: more than 500 %? I have no idea why an access to an entity table already loaded is now as much expensive.

If we cannot solve this problem then Wikidata will become useless for Wikimedia projects especially for list articles. An optimized Wikidata access for Wikivoyage and other Wikimedia projects is a mission-critical goal. We need the data at Wikidata to easily maintain them for all Wikivoyage projects. The main problem at Wikivoyage is to use data not only in a single information box but at many places in text, too. And why we cannot use this data any longer as we did it within the last four years?

When I started to work on a Wikidata usage in vCard template four years ago I was really astonished by the huge computing time: about one minute per article (of course, the parsing was stopped before). That's why I am using a lot of helper tables to prevent additional Wikidata access for instance to get labels and country-specific data.

ItamarWMDE renamed this task from Wikibase Lua: addStatementUsage seems unjustifiably expensive to [6hr.] Investigation - Wikibase Lua: addStatementUsage seems unjustifiably expensive.Oct 20 2020, 7:50 AM
ItamarWMDE updated the task description. (Show Details)

Leaving T263999: Some lua-calls with language specified does not end up in formatterCache because it has similar buzz words but I don't know if they are related.

Yeah I'm suspecting this could be the culprit ( or significantly contributing to the slow-down ).

Looking at the code of vcard which in turn uses this module to make all the requests there is a lot of language dependent calls being made that would end up not using the cache and make this go a lot slower.

However no real insights into when that ended up being deployed and used on https://de.wikivoyage.org/wiki/Halle_(Saale).

It also does not really match what @RolandUnger writes here.

Some days ago a single addStatementUsage call costs only about 1 ms but now 6 ms . The question arose: why this drastic change: more than 500 %? I have no idea why an access to an entity table already loaded is now as much expensive.

I am not sure if T263999 is the real cause for this bug. But I cannot exclude this cause, or maybe both bugs have a joint cause elsewhere. addStatementUsage only registers the use of an entity and a property and has maybe nothing to do with functions in which a language was specified.

Today I added some additional code/maintenance to our vCard script to check the usage of getLabel and getLabelByLang Wikibase functions.

Both functions are not used in the article of Halle (Saale) but in the article of Eisenbahnmuseen in Europa. In any case, the bug still occurs. Otherwise properties with type monolingualtext are used. There are several maintenance categories to check usage of getLabel: getLabel used (temporarily added), label from Wikidata, unknown label, and opening-hours label from Wikidata.

Ok, so after looking at this yesterday and today I've tried to replicate my local environment to what has been and what currently is deployed on https://de.wikivoyage.org/wiki/Spezial:Version

LuaSandbox 3.0.3
Lua 5.1.5

Installing luasandbox for mediawiki-docker-dev with the following command.

apt-get install liblua5.1-0-dev

git clone https://gerrit.wikimedia.org/r/mediawiki/php/luasandbox.git

phpize && ./configure && make && sudo make install

# add this extension=luasandbox.so
nano /opt/docker/etc/php/php.ini

service php-fpm restart

In Scribunto Enginge.php comment out the following to get the function profile on line 71

		#if ( $t < 1.0 ) {
		#	return $ret;
		#}

rolling my development environment back to 1.36.0-wmf.8 for Wikibase, core, Scribunto

using a simple module such as

local p = {} --p stands for package

function p.testAddStatementUsage( frame, entityID, property )

    return mw.wikibase.getBestStatements ( 'Q1', 'P1' )
end

return p

addStatementUsage is only called once and the profile returns the following

<!-- 
NewPP limit report
Parsed by mediawiki.mw.localhost
Cached time: 20201021122924
Cache expiry: 86400
Dynamic content: false
Complications: []
CPU time usage: 0.032 seconds
Real time usage: 0.033 seconds
Preprocessor visited node count: 5/1000000
Post‐expand include size: 10/2097152 bytes
Template argument size: 0/2097152 bytes
Highest expansion depth: 2/40
Expensive parser function count: 0/100
Unstrip recursion depth: 0/20
Unstrip post‐expand size: 0/5000000 bytes
Lua time usage: 0.013/10.000 seconds
Lua memory usage: 525 KB/50 MB
Lua Profile:
    Scribunto_LuaSandboxCallback::getEntityStatements                 20 ms      100.0%
Number of Wikibase entities loaded: 1/250
-->
<!--
Transclusion expansion time report (%,ms,calls,template)
100.00%    0.000      1 -total
-->

Running it a couple of times Lua time usage seems vary between 0.011 - 0.014

Moving forward to 1.36.0-wmf.13 on Wikibase, core, Scribunto

I'm running the same script and now getting the following results.

<!-- 
NewPP limit report
Parsed by mediawiki.mw.localhost
Cached time: 20201021124843
Cache expiry: 86400
Dynamic content: false
Complications: []
CPU time usage: 0.046 seconds
Real time usage: 0.048 seconds
Preprocessor visited node count: 5/1000000
Post‐expand include size: 10/2097152 bytes
Template argument size: 0/2097152 bytes
Highest expansion depth: 2/40
Expensive parser function count: 0/100
Unstrip recursion depth: 0/20
Unstrip post‐expand size: 0/5000000 bytes
Lua time usage: 0.024/10.000 seconds
Lua memory usage: 526 KB/50 MB
Lua Profile:
    Scribunto_LuaSandboxCallback::getEntityStatements                 20 ms      100.0%
Number of Wikibase entities loaded: 1/250
-->
<!--
Transclusion expansion time report (%,ms,calls,template)
100.00%    0.000      1 -total
-->

Running it a couple of times it seems we indeed have an increase in the Lua time usage running the same scripts varying between 0.022 - 0.024

The computing times for Scribunto_LuaSandboxCallback::getEntity and Scribunto_LuaSandboxCallback::addStatementUsage did not change much, but only that of Scribunto_LuaSandboxCallback::addStatementUsage changed drastically. The number of properties fetched increased within the last four weeks only by one which cannot explain the increase of computing time of this simple function. Maybe there is a problem with the accumulator/cache or database.

I did some more tests with @Tarrow yesterday evening and seems this might just have been noise as we could not reproduce it again.

Looking at the profile of https://de.wikivoyage.org/wiki/Halle_(Saale) this morning there is a drastic decrease of time spent on Scribunto_LuaSandboxCallback::addStatementUsage did you find a solution yourself @RolandUnger?

<!-- 
NewPP limit report
Parsed by mw2295
Cached time: 20201022073715
Cache expiry: 86400
Dynamic content: true
Complications: []
CPU time usage: 5.968 seconds
Real time usage: 7.690 seconds
Preprocessor visited node count: 16145/1000000
Post‐expand include size: 1675261/2097152 bytes
Template argument size: 1214/2097152 bytes
Highest expansion depth: 14/40
Expensive parser function count: 143/500
Unstrip recursion depth: 0/20
Unstrip post‐expand size: 261546/5000000 bytes
Lua time usage: 4.489/10.000 seconds
Lua memory usage: 12.85 MB/50 MB
Lua Profile:
    Scribunto_LuaSandboxCallback::callParserFunction                1720 ms       38.4%
    Scribunto_LuaSandboxCallback::getEntity                          780 ms       17.4%
    Scribunto_LuaSandboxCallback::addStatementUsage                  360 ms        8.0%
    Scribunto_LuaSandboxCallback::getEntityStatements                260 ms        5.8%
    ?                                                                180 ms        4.0%
    recursiveClone <mwInit.lua:41>                                   160 ms        3.6%
    Scribunto_LuaSandboxCallback::gsub                               160 ms        3.6%
    Scribunto_LuaSandboxCallback::incrementStatsKey                   80 ms        1.8%
    Scribunto_LuaSandboxCallback::addSiteLinksUsage                   80 ms        1.8%
    Scribunto_LuaSandboxCallback::jsonEncode                          60 ms        1.3%
    [others]                                                         640 ms       14.3%
Number of Wikibase entities loaded: 162/400
-->
<!--
Transclusion expansion time report (%,ms,calls,template)
100.00% 6273.821      1 -total
 85.36% 5355.134    255 Vorlage:VCard
  5.76%  361.668      1 Vorlage:Quickbar_Ort
  3.52%  220.525      1 Vorlage:Mapframe
  2.01%  125.858      4 Vorlage:Marker
  0.71%   44.688      6 Vorlage:Buch
  0.45%   28.112      1 Vorlage:GeoData
  0.29%   18.192      1 Vorlage:IstInKat
  0.29%   17.898      1 Vorlage:Einzelnachweise
  0.21%   13.423      1 Vorlage:Class-3
-->

I was surprised, too. And I did not made any changes (yesterday morning I only added some maintenance categories). The computing times for addStatementUsage oscillate (between 400 and 900 ms) but they are clearly reduced. I think yesterday the 1.36.0-wmf.14 Mediawiki branch was deployed at Wikivoyage. Maybe that means to find the cause we had to look for it outside of Wikibase. Unless I'm very much mistaken addStatementUsage only makes a registration of entities and properties used. This should not be expensive. But I have no idea what happens at the registration and what went wrong.

As you can see: our main problem is Scribunto_LuaSandboxCallback::callParserFunction which takes between 40 and 50 % of total Lua time (250 expensive parser function calls). But this is a Kartographer problem and therefore another stuff.

We discussed the way forward for this task this morning and decided we will not spend more time investigate what could have been the root cause now ( even though it would be very satisfying ) but keep this ticket on the board and monitor if there any changes in the next deployments.

I checked it again now and the addStatementUsage part takes 860 ms. While this is still large and we should improve it I am going to close this ticket because the spike seems to be gone and have been an issue with a deployment.