Page MenuHomePhabricator

[Tracking] Fine-grained change notifications based on tracking from Lua getters via __index
Closed, ResolvedPublic

Description

Use __index to lazy register entity usages when community code calls mw.wikibase.entity.

We should only track an all usage for an entity table loaded via [[ https://www.mediawiki.org/wiki/Extension:Wikibase_Client/Lua#mw.wikibase.getEntity | mw.wikibase.getEntity ]], if the data on it is actually used. This can (later on?) be expanded, along with T151717, to track the most specific usages only.

Steps to get this enabled in production:

  • Merge T178153: make disabledUsageAspects coarse graining
  • Deploy T178153 (27cb7f6e1615db26460d035bc8ee372f1ec15f8e) AND MAKE SURE IT IS NOT GOING TO BE REVERTED (it should be in wmf.n and wmf.(n-1) before continuing)
  • Change the disabledUsageAspects setting: 'disabledUsageAspects' => [ 'D' => 'O', 'C' => 'O' ] (disabled description+statement usages) / 'disabledUsageAspects' => [ 'D' => 'O' ] (disabled description usages).
  • Enable this change (fine grained Lua tracking) in the settings for all statement usage test wikis. This step needs to be done before the next: Otherwise some/ most of the statement usages from the test wikis will disappear again.
  • Use compact representation of diffs in EntityChange. (T113468)
  • Make AffectedPagesFinder take description usage into account (T176417)
  • Make AffectedPagesFinder take statement usage into account (T176413)
  • https://gerrit.wikimedia.org/r/#/c/371651 got reverted and must be re-submitted (T179923)
  • Deploy this change
  • Gradually enable this on all wikis (T184322)

Note: This is similar to T76156: [Story] mw.wikibase: Use __index to lazy load entity contents.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

If I'm not missing something this is pretty easy to be done by generalizing the claims trick for labels/sitelinks/descriptions.

see patch: https://gerrit.wikimedia.org/r/#/c/371651

Change 371651 had a related patch set uploaded (by Legoktm; owner: Eranroz):
[mediawiki/extensions/Wikibase@master] xkill - lazy track labels/sitelinks/claims

https://gerrit.wikimedia.org/r/371651

As an aside, in the case of commons, it might make sense to always record label usage as being for all languages, instead of which specific language, since commons uses weird {{int:...}} hacks for multilingualness, which won't record all the usages for non-canonical language parses (OTOH, that's an issue commons has had with recording link usages for a very long time, so its not like non-canonical parse link tracking issues is anything new to commons)

As an aside, in the case of commons, it might make sense to always record label usage as being for all languages, instead of which specific language, since commons uses weird {{int:...}} hacks for multilingualness, which won't record all the usages for non-canonical language parses (OTOH, that's an issue commons has had with recording link usages for a very long time, so its not like non-canonical parse link tracking issues is anything new to commons)

Indeed, this is tracked as T173196: Client term access in user language should track all labels/ descriptions as being used on multilingual wikis already.

What's the status of this? With usage tracking balooning on several sites, this seems rather urgent!

@eranroz It seems to me this is becoming rather urgent. If you are not available to work on the patch, someone may talk over to get the patch merged. I hope this is ok for you.

@Lydia_Pintscher We really should fix this asap, it should help a lot with the RC spam problem. I'm putting this into the sprint.

@hoo Do I understand correctly that all this needs is a feature flag? Do we need more eyes on the code?

@daniel , I'll update the code soon per the code review comments.
AFAIK it is currently blocked for deployment by T151717#3675687.

@eranroz I see no blocker, T151717#3675687 is only relevant for enabling per-property statement usage tracking, which has a separate feature switch. Avoiding X tracking when no statements are used, but only labels and/or sitelinks, would still be very helpful. There is potential for increasing the usage tracking table (because we may track L.de and L.en instead of just X), but I don't expect that increase to be explosive - and it should be offset by a dramatic reduction of inserts into recentchanges. We should have an eye on it, though.

Please implement as I documented on gerrit: Have a feature flag which switches between X+C usages and no X usage, but dynamic tracking. That way we can enable this wiki per wiki.

Looking at things like https://commons.wikimedia.org/w/index.php?title=File:Vincent_Willem_van_Gogh_110.jpg&action=info it seems like we really should get this done ASAP.

Nice example,I get your point :)
Please go over the last patch (adds descriptions + feature flag and its documentation). Also note that once deployed it should disabled in wmf-config and enabled per wiki as it may increase the wbc_entity_usage row number.

Change 386408 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/mediawiki-config@master] Make disabled usage aspect use the new config

https://gerrit.wikimedia.org/r/386408

Change 386408 merged by jenkins-bot:
[operations/mediawiki-config@master] Make disabled usage aspect use the new config

https://gerrit.wikimedia.org/r/386408

Mentioned in SAL (#wikimedia-operations) [2017-10-25T18:36:30Z] <niharika29@tin> Synchronized wmf-config/InitialiseSettings.php: Make disabled usage aspect use the new config T172914 (duration: 00m 50s)

Change 386421 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable fine grained usage tracking on statement usage wikis

https://gerrit.wikimedia.org/r/386421

Mentioned in SAL (#wikimedia-operations) [2017-10-25T18:45:44Z] <niharika29@tin> Synchronized wmf-config/InitialiseSettings.php: Enable fine grained usage tracking on statement usage wikis T172914 (duration: 00m 50s)

Change 386466 had a related patch set uploaded (by Hoo man; owner: Eranroz):
[mediawiki/extensions/Wikibase@wmf/1.31.0-wmf.5] xkill - lazy track labels/sitelinks/claims

https://gerrit.wikimedia.org/r/386466

Change 371651 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] xkill - lazy track labels/sitelinks/claims

https://gerrit.wikimedia.org/r/371651

Change 386466 abandoned by Hoo man:
xkill - lazy track labels/sitelinks/claims

https://gerrit.wikimedia.org/r/386466

It slipped my mind that deploying this now would mean we sometimes track C or D usages alone, which we don't yet support.

Added the remaining steps to the road map. I guess this will be blocked for a while… :(

I reverted the already merged change, feel free to upload a "revert revert" for later.

thiemowmde renamed this task from mw.wikibase.entity: Use __index to lazy register entity usages to [Tracking] Lazy register fine-grained usage tracking in generic Lua getters via __index.Oct 26 2017, 2:49 PM
thiemowmde updated the task description. (Show Details)
thiemowmde moved this task from incoming to in progress on the Wikidata board.
Lucas_Werkmeister_WMDE renamed this task from [Tracking] Lazy register fine-grained usage tracking in generic Lua getters via __index to [Tracking] Fine-grained usage tracking in generic Lua getters via __index.Nov 7 2017, 1:14 PM
daniel renamed this task from [Tracking] Fine-grained usage tracking in generic Lua getters via __index to [Tracking] Fine-grained change notifications based on tracking from Lua getters via __index.Nov 21 2017, 1:56 PM

During todays discussion round there was huge confusion what this ticket here should be about: Is it exclusively about Lua? If so, why? Shouldn't this be about fine-grained usage tracking and change notifications in general? And what does "fine-grained" include? Tracking statements and descriptions as well?

Here is my attempt to answer these questions:

  • The patch https://gerrit.wikimedia.org/r/392041 (a resubmission of the original https://gerrit.wikimedia.org/r/371651) is incomplete. There are two ways forward:
    1. Via a feature flag: The patch could be expanded to (temporarily) understand the disabledUsageAspects setting. This will allow us to merge the patch without having an effect in production. Only if the setting is removed in production, all code (both the parser functions as well as all Lua functions) will start tracking descriptions and statements individually.

IMO, disabledUsageAspects should be aspect of SqlUsageTracker/AffectedPagesFinder and not Lua related issue. (e.g if C is disabled - SqlUsageTracker should replace C to O when registering usage, and AffectedPagesFinder should replace C to O when getting notification)

Change 398823 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/mediawiki-config@master] Don't enable lua fine grained tracking for any wiki

https://gerrit.wikimedia.org/r/398823

By the next deployment of Wikidata, the first year of the next week, we will be able to properly track Wikidata usages from Lua (T179923) - so all dev blockers are solved, and it is up to configuration/operation.

Operation implications

  • wbc_entity_usage - expected to grow (any row with eu_aspect=X may be replaced with few rows), but AFAIK this should be only a slight growth (up to say x4, where All usage (X) is replaced with other (O)/label (L)/sitelink (S)/alias (A), and usually less). This is under assumptions that (T151717) is not deployed - e.g we dont track claims (they are just O usages), but eventually, this task becomes relevant too.
  • recentchanges - expected to see much less updates from wikidata (non relevant updates)

Suggested deployment plan
I suggest the following timeline for deployment:

  • T179923 - week of 2018-01-02 - enable it on cawiki, cewiki, elwiki, kowiki, trwiki (@Ladsgroup if this is OK from you and all stack holders, it means we can abandon https://gerrit.wikimedia.org/r/#/c/398823/ )
  • If everything works well, we can deploy the next week (week of 2018-01-08) to few large wikis for evaluations and if all rocks we can deploy everywhere.

Change 398823 merged by jenkins-bot:
[operations/mediawiki-config@master] Don't enable lua fine grained tracking for any wiki

https://gerrit.wikimedia.org/r/398823

Mentioned in SAL (#wikimedia-operations) [2018-01-03T20:19:15Z] <thcipriani@tin> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:398823|Do not enable lua fine grained tracking for any wiki]] T172914 (duration: 01m 02s)

Change 402823 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/mediawiki-config@master] Enable fine grained usage tracking in hewiki

https://gerrit.wikimedia.org/r/402823

Change 402823 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable fine grained usage tracking in hewiki

https://gerrit.wikimedia.org/r/402823

Mentioned in SAL (#wikimedia-operations) [2018-01-08T14:56:10Z] <hashar@tin> Synchronized wmf-config/InitialiseSettings.php: Enable fine grained usage tracking in hewiki - T172914 (duration: 00m 50s)

Just after enabling (to be compared in few days)

use hewiki_p;
select eu_aspect,count(*) from wbc_entity_usage group by eu_aspect ;
+-----------+----------+
| eu_aspect | count(*) |
+-----------+----------+
| D.en      |        5 |
| D.he      |       46 |
| L         |        1 |
| L.en      |    61589 |
| L.he      |   518217 |
| O         |   280012 |
| S         |   293351 |
| T         |   354842 |
| X         |    24498 |
+-----------+----------+
select left(rc_timestamp,8), sum(if(rc_source='wb',1,0)) as WB, count(*),sum(if(rc_source='wb',1,0))/count(*) as RATIO from recentchanges group by left(rc_timestamp,8);
+----------------------+------+----------+--------+
| left(rc_timestamp,8) | WB   | count(*) | RATIO  |
+----------------------+------+----------+--------+
| 20171209             |  441 |     1613 | 0.2734 |
| 20171210             | 4343 |    10381 | 0.4184 |
| 20171211             | 3399 |     9134 | 0.3721 |
| 20171212             | 3452 |     8607 | 0.4011 |
| 20171213             | 2775 |     8048 | 0.3448 |
| 20171214             | 3901 |    10995 | 0.3548 |
| 20171215             | 3528 |     9773 | 0.3610 |
| 20171216             | 3112 |     7919 | 0.3930 |
| 20171217             | 3394 |     9882 | 0.3435 |
| 20171218             | 2893 |    10940 | 0.2644 |
| 20171219             | 3110 |    10066 | 0.3090 |
| 20171220             | 5545 |    12219 | 0.4538 |
| 20171221             | 4544 |    11888 | 0.3822 |
| 20171222             | 4170 |    10191 | 0.4092 |
| 20171223             | 3173 |     8475 | 0.3744 |
| 20171224             | 5318 |    10429 | 0.5099 |
| 20171225             | 2472 |     8507 | 0.2906 |
| 20171226             | 4209 |    10059 | 0.4184 |
| 20171227             | 4341 |    11177 | 0.3884 |
| 20171228             | 2860 |     9564 | 0.2990 |
| 20171229             | 4190 |    10294 | 0.4070 |
| 20171230             | 3676 |    10143 | 0.3624 |
| 20171231             | 6084 |    12698 | 0.4791 |
| 20180101             | 3622 |    11578 | 0.3128 |
| 20180102             | 4400 |    16172 | 0.2721 |
| 20180103             | 6242 |    12910 | 0.4835 |
| 20180104             | 5331 |    11774 | 0.4528 |
| 20180105             | 5896 |    10563 | 0.5582 |
| 20180106             | 5757 |    17901 | 0.3216 |
| 20180107             | 2871 |     8944 | 0.3210 |
| 20180108             | 1803 |     6516 | 0.2767 |
+----------------------+------+----------+--------+

After 2 weeks:

use hewiki_p;
select eu_aspect,count(*) from wbc_entity_usage group by eu_aspect ;
+-----------+----------+
| eu_aspect | count(*) |
+-----------+----------+
| D.en      |       17 |
| D.he      |      424 |
| L         |        1 |
| L.en      |    65293 |
| L.he      |   539313 |
| O         |   308782 |
| S         |   294197 |
| T         |   374019 |
| X         |       13 |
+-----------+----------+

X is almost eliminated

For recent changes: slightly less wb changes

`
| 20180108             | 2305 |     7906 | 0.2916 |
| 20180109             | 2214 |     7941 | 0.2788 |
| 20180110             | 2220 |     8984 | 0.2471 |
| 20180111             | 2119 |     7696 | 0.2753 |
| 20180112             | 1678 |     6434 | 0.2608 |
| 20180113             | 1479 |     6103 | 0.2423 |
| 20180114             | 1721 |     7928 | 0.2171 |
| 20180115             | 1399 |     7425 | 0.1884 |
| 20180116             | 1712 |     8769 | 0.1952 |
| 20180117             | 2994 |    11005 | 0.2721 |
| 20180118             | 1910 |     7166 | 0.2665 |
| 20180119             | 2393 |    10985 | 0.2178 |
| 20180120             | 1686 |     6439 | 0.2618 |
| 20180121             | 2189 |     7328 | 0.2987 |
| 20180122             | 4135 |     9622 | 0.4297 |

The average wikibase inseration ratio of the weeks before enabling xkill is 39% and after that it's 26%. I don't call it slight ;) Amazing work. Thanks!