Page MenuHomePhabricator

Look into a simple way to have global keys with db-replicated
Closed, ResolvedPublic0 Estimated Story Points

Description

Right now, makeGlobalKey() is not useful with db-replicated, since it always uses the local wiki table. It might be nice if SqlBagOStuff could route "global" prefixed keys to a shared a DB (e.g. metawiki or a DB on extension1) via a config option. That way, regular, unshared keys would still be striped over the s1-s8 shards.

Event Timeline

This is presumably motivated by the reduction of use of Main Stash, which provides fast and highly-persistent storage. For things that don't need as much persistence, WANObjectCache (memcache) can be used. For things that don't need as quick retrieval speed, often db-replicated can be used instead.

However, per this task, it has one important omission, which is that it is scoped to a specific wiki (unlike all other forms of caching we have). Which means anything that relates to cross-wiki or central-ish stuff can't use it.

Krinkle triaged this task as Medium priority.Jul 29 2019, 7:59 PM

Change 524655 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] objectcache: dependency inject LoadBalancer into SqlBagOStuff

https://gerrit.wikimedia.org/r/524655

@jcrespo @Marostegui What do think of the idea of having another cluster of mysql servers set up just like the parser cache ones? That would be nice from an HA perspective and to avoid adding extra load to any existing DB cluster (e.g. objectcache table of metawiki or extension1)? Traffic would be modest given that it would start out for use for WikimediaEvents, LoginNotify, perhaps AbuseFilter stats too (see https://docs.google.com/document/d/1tX8ekiYb3xYgpNJsmA1SiKqzkWc0F-_E4SGx6BI72vA/edit#heading=h.bdt9mhl3o7k5).

Thanks for the ping @aaron.
That document doesn't really show what is the expected amount of reads/writes or even disk space usage.
We have to keep in mind that setting up a new cluster would require a minimum of 4 servers (2 per DC) to have HA, that means we'd need to look for budget for that, more hosts to maintain, another section to support etc.
Leaving the budget thing aside, I think that maybe x1 could be a place for this.

x1 is at the moment under utilized in all its fronts, QPS, disk space, etc - so having this there doesn't really sounds bad to me, unless you've got concerns about its future and its future growth.
I wouldn't put this in an existing s1-s8 section indeed, but x1 might be a good place given the amount of resources we have there.
Even if we were concerned about reads performance, we could look at adding 1 more host per DC to x1 if needed (which is still better than getting 4 new more)

Rewording Manuel words in actionables:

  • Design document should be filled in with more non-functional requirements (disk, size, amount of writes, usage policy). Etc. I know those are probably TBD/TBResearched, but those normally mandate the resources needed to cover them @aaron and other stakeholders should work on that for DBA to provide better feedback
  • Once that is clear, Manuel on me will add it to budget for next year
  • If this is required now, Manuel suggests using x1

Now my take on this- I would prefer not to start with x1- x1 is mostly static metadata (and some small data) shared between wikis; A first look this seems like a generic store for several things (but of course, the document is unclear), and if the usage is very low, later merge it into x1. Maybe some of the test hosts can be used to make a production mockup if other people no longer need them to move things forward. But if they are thought like "parsercache but with non-cache data" I would be scared to merge it with x1 or other existing db servers (but again, there is imperfect understanding on my side).

The other reason is that parsercache model is not really a great model for data storage- there is not really HA, and I have been complaining about that for some time even for non-canonical data. I would be worried to apply the same model for it to canonical data without further HA changes to the architecture either on code or db layer.

This is not a rant, but an answer to "What do think of the idea of having another cluster of mysql servers set up just like the parser cache ones?" having another cluster should be ok, but pcs are not a great model (e.g. compared to es* or s* ones, for example).

Change 524655 merged by jenkins-bot:
[mediawiki/core@master] objectcache: dependency inject LoadBalancer into SqlBagOStuff

https://gerrit.wikimedia.org/r/524655

We have to keep in mind that setting up a new cluster would require a minimum of 4 servers (2 per DC) to have HA, that means we'd need to look for budget for that, more hosts to maintain, another section to support etc. […]

x1 is at the moment under utilized in all its fronts, QPS, disk space, etc - so having this there doesn't really sounds bad to me, unless you've got concerns about its future and its future growth. […] x1 might be a good place given the amount of resources we have there.
Even if we were concerned about reads performance, we could look at adding 1 more host per DC to x1 if needed (which is still better than getting 4 new more)

Note that this would obsolete the Redis machines we have for the main stash currently. (I guess not really, since that is currently co-located on the Memcached cluster).

[…] I would be worried to apply the same model for it to canonical data without further HA changes to the architecture either on code or db layer.

The main stash is currently backed by the Redis instances on the mc-hosts (labelled "redis-sessions", for legacy reasons). Depending on your definition, this is probably not canonical data. These are relatively short-lived (upto a week or so) key-value pairs, for which fallback behaviours exist if they were to go missing. However, unlike WANCache/Memcached, the data in the main stash cannot usualy be computed. E.g. it is more akin user sessions, chronprot positions, and rate limit counters in that regard. They do matter, but it's not like they can be re-computed if missing, it's just that we degrade temporarily if they disappear.

Telemetry on current data size and read/write actions can be found under Redis: redis_sessions in Grafana.

Currently:

  • Data size: 3GB in total. The Redis cluster have 9GB capacity (18x520MB). The data size is stable at ~3GB. Things are generally added and expired at about the same rate.
  • Rate: 25-35 K ops/second. (20-30K/s reads, 3K/s sets)

These will become lower and smaller once the last remaining sessions are moved over to Cassandra (see T243106 and Cassandra: Sessionstore in Grafana. The last switch introduced about +0.5G data and +13K req/sec of which +12K reads and +1K writes, so we can assume those to get deducted from the above.

Expected:

  • Data size: 2.5GB.
  • Rate: 12-22K ops/second. (8-18K/s reads, 2K/s sets)

Thanks for providing those figures.
While the data is pretty tiny, the expected amount of reads is quite big, so we should probably look at buying servers for this and not use x1.
Probably 3 servers per DC is enough, specially if the amount of reads will slowly decrease over time.

This is probably the right moment to get this budgeted as we are planning what we'd need for next FY. I will talk to @mark on Thursday and let him know about this requirements.

This task is about the core support which has landed in master. Let's continue this on the parent task.

Krinkle reassigned this task from Krinkle to aaron.