Creating a new task to follow-up on today's meeting. We decided that DBAs would look into creating materialized views to help both Analytics and cloud use cases. I suggest we edit this task to keep a list of requirements that the DBAs can work with.
Analytics Requirements
- Faster access to corresponding actor_name for archive, ipblocks, logging, and revision
- Faster access to corresponding comment_text for ipblocks, logging, and revision
- Our access pattern for importing is always: select [subset of fields] from [subset of tables] where timestamp < 'The first day of the current month'.
So, for example, if it's November 2nd, and the materialized views have updated so that they include at least all transactions through November 1st 00:00, that would be great for us. If this would be available but much later, like November 5th or so, then we'd have to do some tests to see how fast we can import from the new infrastructure.
Cloud Requirements
- Faster access to the agent and comment views for our users because those two tables depend on many subqueries to get a single result because the rules that determine visibility are in other tables.
- Cloud users access the data in all kinds of ways. From my scan of the definitions, a materialized view that is behind another table could cause a broken reference to a comment or actor record briefly, but it wouldn't result in an exposure because the rules are held in the external tables. However close to real-time updates of those tables that we can come is good, but whatever we determine is possible needs to be communicated to cloud users so that they can adjust things if needed.
The content table also does a subquery to the slots table, but it is a single subquery that is probably not going to be as much of a problem. It's just worth noting in case that is eventually incorrect.