Page MenuHomePhabricator

Explore ways to include various information in our logging payload
Closed, ResolvedPublic

Description

The following table describes additional information to be included on the Wikibase pingback (on top of what is already sent with MediaWiki Pingback): https://docs.google.com/spreadsheets/d/180piospJX1eBARCTRraIDPnZIzUEu7TaWYPeiyCr-tA/edit#gid=0

Investigate how feasible it would be to collect the following pints of info

  • Is Development / Test (potentially we could disable the pingback for dev and test systems)
  • Federation (Has a setting which we can read)
  • Extensions (Look into config file / Special:Version lists them out anyhow)
  • Items (Might be messy needs further investigation, ask Adam or Lucas or Marius or Amir)

Event Timeline

Pingback Datapoints: Notes, Pros & Cons

Is Development / Test <Boolean>
  • We can potentially track this by adding an additional flag
  • Pro: We get additional stats of current dev and test installs of WB
  • Con: Having to add an additional config variable increases complexity of use and can cause false positives from misconfigured wikis
  • If the goal is only to exclude these types of WB installations from our stats, it's probably best to just instruct users of these Wikibases to keep the pingback config off.
Is Federated <Boolean>
  • Since federation is enabled by a setting it should not be too difficult to report whether it is enabled
  • Pro: We can see which wikis utilize federation
  • Con: So far there's only one particular config we can utilize which related to federated properties only, there is no overarching federation setting which means that the pingback will have to be updated each time a new federation feature is added
  • Seems like a low hanging fruit, as we can just check if $wgWBRepoSettings['federatedPropertiesEnabled'] is set to true
Enabled Extensions <List>
  • There is the possibility to get all enabled extensions and skins through the extension registry
  • Pro: Obtain stats of additional extension usage over time
  • Con: Some of this info might be irrelevant, such as skins or extensions unrelated to wikidata
  • Seems it would be feasible to implement this feature using ExtensionRegistry::getInstance()->getAllThings(). We might even be able to filter the list of installed extensions by desired types, if need be, as each "Thing" also includes a type key (i.e. Wikibase extensions are of type => 'wikibase'
Number of Items <Integer>
  • The simplest way to obtain this number seems to be to count all pages in the items namespace
  • Pro: Will obtain number of items in Wikibases over time,
  • Con: This requires us to run SQL queries against the users database, which might potentially cause issues
  • Since there is no running tally of items, the way to obtain the number of items is to run a query similar to the one below. With the introduction of T195792 in the future, we could possibly mitigate server issues by introducing a query timeout.
SELECT COUNT(*) from page where page_namespace = <item namespace>;
Number of Items <Integer>
  • The simplest way to obtain this number seems to be to count all pages in the items namespace
  • Pro: Will obtain number of items in Wikibases over time,
  • Con: This requires us to run SQL queries against the users database, which might potentially cause issues
  • Since there is no running tally of items, the way to obtain the number of items is to run a query similar to the one below. With the introduction of T195792 in the future, we could possibly mitigate server issues by introducing a query timeout.
SELECT COUNT(*) from page where page_namespace = <item namespace>;

Indeed counting all items on a growing Wikibase can become a problem.
If we want this information in order to determine how big a wikibase is, we can instead get the count of all pages, which is a way less expensive operation, because of the structure of the database.

Proposed query:

SELECT TABLE_ROWS FROM information_schema.TABLES WHERE TABLE_NAME = 'page'

@Samantha_Alipio_WMDE Question to product: Will it be still useful if we record the number of all pages (that means all entities and their talk pages) instead of the number of items? Or will it be useful to determine if a wikibase is populated or not by checking if there is more than N pages on a wikibase?

we can instead get the count of all pages,

This possibility was proposed in the mattermost discussion we had with @Addshore in the Wikibase channel, though keep in mind this probably will count all types of content pages and not necessarily just Items, as the metric requires. See full discussion here: https://mattermost.wikimedia.de/swe/pl/ty8s57ehwjn73cspmuk9x9oruh

ItamarWMDE moved this task from Peer Review to Done on the Wikibase Pingback board.

Setting task status to resolved as it seems there is no work left in this task. (Please update the task status once a task is done - thanks a lot!)