As a product manager for Wikidata and WDQS, I want to know what quantifiable benefits to service reliability and quality I might expect to gain (or lose) by splitting scholarly articles out from the Wikidata graph, so that I can decide whether to move ahead with this plan and how to communicate it.
In order to move ahead with splitting out scholarly articles from WD, communicate this decision, and set expectations around the benefits of implementing this change, we should get some baseline measurements of the current state of scholarly articles in Wikidata and WDQS, and estimates about the effects of splitting them off.
Get the numbers for the following metrics:
- percentage, number of Wikidata entities that are scholarly articles
- number of triples in Wikidata related to scholarly articles
- percentage, number of WDQS queries per month that involve scholarly articles (including authors and publications)
- percentage, number of the above queries that only involve scholarly articles (including authors and publications)
- percentage, number of scientific papers that are connected to non-scientific paper items in WD (not including authors and publications)
- given the current rate of growth of Wikidata, approximately how much time it would take for Wikidata to grow back to its current size if we removed scholarly articles
- rate of growth of scholarly articles
- Identify number of authors that were probably added solely for the purpose of mentioning in scholarly articles. (i.e separating schoalrly articles would also mean these authors items become isolated)
- Number of authors connected to other subgraphs in Wikidata vs only connected to scholarly articles