Page MenuHomePhabricator

track usage of Wikibase Lua functions
Closed, ResolvedPublic

Description

We'd like to see how much each of our Lua functions is used and how that changes over time. The first use we are interested in is how many times they are written in the source code of a Lua module across all Wikimedia projects.

https://www.mediawiki.org/wiki/Extension:Wikibase_Client/Lua has a list of functions we are interested in.

Event Timeline

I can pick this one up if someone can provide an introduction to Lua modules usage in Wikimedia projects for me.

The current task description does not provide enough information on how to perform the task (where are the data: how does one learn which Lua module is used in what project, for example?).

I can see that is easy to get to the source code of the Lua modules, however... if doing full text search for function names there is the point, then it's sounds too trivial; I guess you had something more complicated on your mind. Please advise or reformulate the task in more precise terms.

I talked about this with @Lydia_Pintscher today and we agreed that the most viable option for this is to simply track the number of calls to each Lua function when they happen. Tracking the number of function calls per page cannot easily be supported by our current infrastructure (attaching this information to pages/ parser cache entries in an easily queryable way is hard).

The nicest way to do this which I can currently see is creating a new a php-callback for both mw.wikibase and mw.wikibase.entity which wraps StatsdService::increment. This callback could then be called , with the respective key, inside of every Lua function that's relevant to us.

You can use https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/437629/4/client/includes/DataAccess/Scribunto/Scribunto_LuaWikibaseLibrary.php as an example where a new callback has been added for use in Lua (getReferencedEntityId).

@hoo Thanks! I will get in touch with you ASAP on this.

@hoo @Lydia_Pintscher As of this ticket:

@hoo Thank you very much for your instructions! However, I am a contractor Data Scientist for WMDE, which in effects means: I can do things like R (what we use for analytics in WMDE), MATLAB, Octave, some Python for analytics (Numpy, Pandas, scikit-learn and similar), but have no knowledge of PHP. I can barely recognize where a function in PHP begins and where it ends :) I typically access the data sets that need to be analyzed from RDBS or Hadoop (Hive) in the WMF Data Lake.

@Lydia_Pintscher If anyone who does PHP with us can follow the instructions that were provided in T191416#4485854 to produce the data set (anything will do: an SQL table somewhere, tab separated, comma separated, whatever), then I could use that data set to provide any analytics that we need from it.

Addshore triaged this task as Medium priority.Oct 8 2018, 10:56 AM

@Lydia_Pintscher: could you update the change description to state what is wanted here? It looks that what the comment above T191416#4485854 says is quite different from what the original description did.
When the description is updated, it would be good to have clearly stated what is the expected to be done here (acceptance criteria), as apparently the topic is quite broad in general.

Change 477971 had a related patch set uploaded (by Hoo man; owner: Hoo man):
[mediawiki/extensions/Wikibase@master] Track usage of Wikibase Lua functions

https://gerrit.wikimedia.org/r/477971

Change 477971 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Track usage of Wikibase Lua functions

https://gerrit.wikimedia.org/r/477971

I suggest to enable this tracking per group for all groups (wikipedia, wikiquote, …) and for the following wikis specifically:

  1. commonswiki
  2. ruwiki
  3. ukwiki
  4. arwiki
  5. zhwiki
  6. cawiki
  7. frwiki
  8. enwiki
  9. svwiki
  10. itwiki

These are the top 10 wikis by number of entity usage (per https://grafana.wikimedia.org/d/000000160/wikidata-entity-usage?refresh=5m&orgId=1&from=1536854438912&to=1544634038914).

hoo removed a project: Patch-For-Review.

Still needs to be enabled via mediawiki-configuration.

I suggest to enable this tracking per group for all groups (wikipedia, wikiquote, …) and for the following wikis specifically:

  1. commonswiki
  2. ruwiki
  3. ukwiki
  4. arwiki
  5. zhwiki
  6. cawiki
  7. frwiki
  8. enwiki
  9. svwiki
  10. itwiki

These are the top 10 wikis by number of entity usage (per https://grafana.wikimedia.org/d/000000160/wikidata-entity-usage?refresh=5m&orgId=1&from=1536854438912&to=1544634038914).

Sounds good. I'd love to have dewiki added as well if possible.

Change 479407 had a related patch set uploaded (by Hoo man; owner: Hoo man):
[operations/mediawiki-config@master] WikibaseClient: Enable Lua function usage tracking

https://gerrit.wikimedia.org/r/479407

This is scheduled to be enabled on Dec 20 after the train (probably around 19:00–20:00 UTC).

Change 479407 merged by Kaldari:
[operations/mediawiki-config@master] WikibaseClient: Enable Lua function usage tracking

https://gerrit.wikimedia.org/r/479407

Still needs something to display the tracking :)

@Addshore

  • if you mean "we need a Grafana dashboard", please provide an .MD file describing the metrics;
  • if you mean Shiny, please let me know what schema do I need to access.

I think this can be closed. T211768 is for the visualization. Or why is this still in doing?