Provide an exact, working definition of a Wikidata usage statistic.
The scope of the definition is: item usage count on page, per project.
Current status. The "O", "L", "S", "T", and "X" usage aspects overlap; moreover, the mapping across the different aspects does not seem to be one-to-one, in the following sense: "S", "T", and "O" aspects do not overlap, however, they may overlap with "X". On the other hand, the "X" aspect cannot be dropped from the definition, since many important information (usage in Infoboxes, for example) would be lost.
There are too many "... typically, this is the case ...", "... that might be the case, but again ..." and similar expressions in the definition of how the usage aspects are tracked in the wbc_entity_usage table. Which is Ok: the table was simply not engineered having in mind its possible future usage in any Data Science project, so we have to improvise with what we have.
Suggestion. For the initial version of the WDCM system, use a simplified definition of Wikidata usage that excludes the multiple item per-page usage cases, in effect:
- count on how many pages a particular Wikidata occurs in a project;
- take that as a Wikidata usage per-project statistic;
- ignore usage aspects completely until a proper tracking of usage per-page is enabled in the future.
By "proper tracking of usage per-page" I mean the following:
- a methodology that counts *exactly* how many usage cases of a particular item, on a particular page, from a particular project there are.
Please comment.
Refer to the following discussion: https://docs.google.com/document/d/1kOhO3NwbheDLh-7s1IHqAcKd3zxB3to9hpfYQx3tbFw/edit?ts=597b25b2