As a product manager for Wikidata and WDQS, I want to know what quantifiable benefits to service reliability and quality I might expect to gain (or lose) by splitting Lexemes out from the Wikidata graph, so that I can decide whether to move ahead with this plan and how to communicate it.
In order to move ahead with splitting out Lexemes from WD, communicate this decision, and set expectations around the benefits of implementing this change, we should get some baseline measurements of the current state of Lexemes in Wikidata and WDQS, and estimates about the effects of splitting them off.
Get the numbers for the following metrics:
- percentage, number of Wikidata entities that are Lexemes
- percentage, number of WDQS queries per month that involve Lexemes
- percentage, number of the above queries that only involve Lexemes (i.e. doesn't require anything from the larger Wikidata graph)
- percentage, number of Lexemes that are connected to non-Lexeme items in WD
- given the current rate of growth of Wikidata, approximately how much time it would take for non-Lexeme Wikidata to grow back to its current size
- potential upper limit of how many Lexemes there could be
Summary of results from this ticket: https://docs.google.com/document/d/1N2ludK2QllzndrlQiQ7c6V1dT3NZBiABQL_kZH1P5Io/edit?usp=sharing