To better understand how WDQS can be improved (particularly in terms of scaling) we need a better understanding of our users and their use cases.
* Define the questions we want answered from the data we have
* Implement additional data collection if needed
* Work with analysts to answer those questions
**Various questions consolidated from Search Platform virtual offsite**
* 2% of queries are taking 95% of the server time: what are those 2% of queries doing? Can / should we restrict them? Are those broken bot queries, or actually valuable?
* what are the most expensive User Agents? Can we identify heavy users and work with them to reduce that load?
* what percentage of queries / which kind of queries care about the freshness of the data?
* how important is it to have the full graph to answer questions that people are asking? can we infer that from the queries + data?
* do we have strongly connected components in the Wikidata graph? Can this be used to split the graph in sub graphs?
*
**random notes from Search Platform virtual offsite:**
* we need to pair someone who knows what to look for with someone who knows how to look for things (@Addshore and @JAllemandou?)
* Currently, we only log the queries. For search, we also log the results, maybe something similar could help answer our questions. Maybe that’s a lot of data. Even just the size of the response might be interesting.
* If we can find entities used in queries and can we group them. If queries are person A, person B, … we should be able to know that queries are about people.