I have been looking at design of the three tables querycache, querycachetwo (!), and qeurycache_info. These are issues I have found so far:
- None of them has PK
- The names are really badly chosen, these are not caches of database queries, or caches of API queries, These are caches of QueryPage class (some special pages and some API modules). Name of the class is also bad but I consider out of scope of this ticket.
- querycachetwo? seriously?
- qc_type and qcc_type can be normalized if querycache_info had PK
- qc_value is the most polymorphic column I have ever seen in my life. Sometimes it points out to user id, sometimes page id, sometimes number of pages linked to the certain page. This way we can't use qc_value to normalize using page or user table :/
- The tables are fairly big in small wikis (a wiki that was made last week and has 7k articles, has 57k rows in querycache making it 8th biggest table there while it's 70th largest table in enwiki), taking lots of storage for s3 but nothing bad for large wikis (@Marostegui or @jcrespo can check better of course)
- 'activeusers' type in querycachetwo actually doesn't take advantage of the extra columns there and simply can be moved to querycache table. As far as I checked. maybe I missed something.
I will try to come up with some new schema for them in the next couple of days. The only open question for me is the migration. Should we define new tables and move everything there (specially, they are being updated through jobs and not via user requests) or slowly fix things in the existing tables. I'm inclined to first option as adding PK or renaming tables can be tricky.
The other open question about it is if anyone willing to help with reviewing my patches. If anyone can help. It would be amazing