The main thing Product-Analytics needs to adapt to T172410: Replace the current multisource analytics-store setup is a tool that lets us run queries against particular wiki replicas without having to think about which shard they are on.
Based on a discussion with @elukey, the best way to accomplish this would probably be a library (owned by Analytics) that handles routing the query to the specific shard, although we're open to any other method.
This library would have a signature something like: run(sql_query, wiki). It wouldn't need to provide for cross-wiki joins or a single staging database accessible from all shards; we are willing to handle those use cases in our own code as long as we have the core ability to run shard-agnostic queries.
Ideally, we would have this functionality available in both Python and R, but R poses an issue for the Analytics team because they have no experience with it. Possible solutions include:
- Someone from Product Analytics writes the R client and hands it over to Analytics for maintenance
- Analytics writes a command line utility which can be easily wrapped in both Python and R
- We make do with just the Python