Page MenuHomePhabricator

WDCM: Wikidata Usage Statistics Definition
Closed, ResolvedPublic

Description

Provide an exact, working definition of a Wikidata usage statistic.

The scope of the definition is: item usage count on page, per project.

Current status. The "O", "L", "S", "T", and "X" usage aspects overlap; moreover, the mapping across the different aspects does not seem to be one-to-one, in the following sense: "S", "T", and "O" aspects do not overlap, however, they may overlap with "X". On the other hand, the "X" aspect cannot be dropped from the definition, since many important information (usage in Infoboxes, for example) would be lost.

There are too many "... typically, this is the case ...", "... that might be the case, but again ..." and similar expressions in the definition of how the usage aspects are tracked in the wbc_entity_usage table. Which is Ok: the table was simply not engineered having in mind its possible future usage in any Data Science project, so we have to improvise with what we have.

Suggestion. For the initial version of the WDCM system, use a simplified definition of Wikidata usage that excludes the multiple item per-page usage cases, in effect:

  • count on how many pages a particular Wikidata occurs in a project;
  • take that as a Wikidata usage per-project statistic;
  • ignore usage aspects completely until a proper tracking of usage per-page is enabled in the future.

By "proper tracking of usage per-page" I mean the following:

  • a methodology that counts *exactly* how many usage cases of a particular item, on a particular page, from a particular project there are.

Please comment.

Refer to the following discussion: https://docs.google.com/document/d/1kOhO3NwbheDLh-7s1IHqAcKd3zxB3to9hpfYQx3tbFw/edit?ts=597b25b2

Related Objects

Event Timeline

Ok. Because the scaling of the WDCM Search and Process modules must proceed in order to have a testable, working version of the system implemented by early autumn, I will proceed with the following definition of Wikidata usage until we come up with something better:

For the initial version of the WDCM system, use a simplified definition of Wikidata usage that excludes the multiple item per-page usage cases, in effect:

count on how many pages a particular Wikidata occurs in a project;
take that as a Wikidata usage per-project statistic;
ignore usage aspects completely until a proper tracking of usage per-page is enabled in the future.

The reason I am proceeding with this without engaging in further discussion is simply because that discussion will need to be elaborated. Let's get things done, this working definition is not perfect but it will do the job for now.

GoranSMilovanovic lowered the priority of this task from High to Medium.Jul 28 2017, 11:01 PM
GoranSMilovanovic lowered the priority of this task from Medium to Low.Nov 1 2017, 10:07 AM
GoranSMilovanovic changed the task status from Open to Stalled.Nov 26 2017, 2:34 PM
  • The changes in the usage aspect recorded in the wbc_entity_usage wikibase schema now imply a different approach to this.
  • The current definition (which defines 'item usage' as the number of pages that mention the item) will be kept for all standard WDCM dashboards.
  • The specialization in respect to usage aspects will take place on the new dashboards (e.g. T187396)
  • Resolving the ticket.