Page MenuHomePhabricator

Add modtoken field and flags to objectcache table
Closed, ResolvedPublic

Description

If 'objectcache' had a UNIX timestamp integer field, it could be used to improve eventual consistency with mysql statement-based circular replication. The field could have a default value and would only be referenced if configured for easier backwards compatibility.

A related feature would be to make delete() use tombstones with a TTL. Tombstone updates would also make use of the timestamp field and could "write-hole" older writes that took a while to replicate over.

A flags field could also be used to signal tombstones, new compression and serialization formats (e.g. JSON).

Related Objects

Event Timeline

Krinkle reassigned this task from Krinkle to aaron.
Krinkle triaged this task as Medium priority.
Krinkle moved this task from Inbox, needs triage to Doing (old) on the Performance-Team board.
Krinkle subscribed.

Change 666780 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] objectcache: add modtime field to objectcache table

https://gerrit.wikimedia.org/r/666780

Change 666781 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@master] objectcache: make use of modtime field in SqlBagOStuff

https://gerrit.wikimedia.org/r/666781

aaron renamed this task from Add a write timestamp field to objectcache table to Add a write timestamp field and flags field to objectcache table.Apr 27 2021, 5:52 PM
aaron updated the task description. (Show Details)

Change 685131 had a related patch set uploaded (by Krinkle; author: Aaron Schulz):

[mediawiki/core@master] rdbms: add an IDatabase method to expose DB server IDs

https://gerrit.wikimedia.org/r/685131

Change 685131 merged by jenkins-bot:

[mediawiki/core@master] rdbms: add an IDatabase method to expose DB server IDs

https://gerrit.wikimedia.org/r/685131

Change 666780 merged by jenkins-bot:

[mediawiki/core@master] objectcache: add last-modified token field to objectcache table

https://gerrit.wikimedia.org/r/666780

Krinkle renamed this task from Add a write timestamp field and flags field to objectcache table to Add modtoken field and flags to objectcache table.May 27 2021, 2:03 AM

you need to create a schema change ticket for DBAs https://phabricator.wikimedia.org/tag/blocked-on-schema-change/ It's showing up in the drift reports now.

@aaron We could re-use this ticket, but either way needs to be tracked as schema change and put on the radar of DBAs.

Also, during T282761 I noticed that several previous scheme chanages have not been applied to production either. For example, the exptime column is sometimes MySQL datetime in production instead of blob. My presumably unforeseen miracle, this actually works since MariaDB is not just able to cast MW's 14-digit timestamp values to a proper datetime value with the same underlying date, it also is able to transparently compare these, so exptime > etc works as intended. But.. we probably shouldn't keep relying on that for something as critical as the parser caches.

Change 698909 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] rdbms: track the acquisition timestamps of named locks in Database

https://gerrit.wikimedia.org/r/698909

Change 692462 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] objectcache: make BagOStuff::lock/unlock easier to override

https://gerrit.wikimedia.org/r/692462

@aaron I'm a bit at a loss here. Please help me understand this better. The below is most likely wrong, but here is what I've currently got:

  • We have a MainStash service that currently exposes a full BagOStuff interface, although there is a plan to narrow this which afaik you're onboard with and in part are the initiator of. There is a non-zero number of consumers of MainStash methods that we think probably shouldn't be used, but are, e.g. callers to CAS, ADD or INCR-with-used-return-value.
    • And we think a subset of those callers may be subject to race conditions where we think it would be bad if a non-latest value wins that race.
      • And, with the current modtoken patch (https://gerrit.wikimedia.org/r/c/666781), we believe it would become more likely for such non-latest wins to happen (per "weak ordered token"), unless we also introduce a new notion of "critical ordered tokens" at the same time.
        • These critical-ordered modtokens would be based on a timestamp that for unclear reasons must have a level of precision that is significantly higher than the level of precision we can expect to (mostly) consistently move forward between appservers, E.g. if we used second-level or minute-level precision, we'd be fine presumably save for a very small window around the whole minute mark. But, we're going with microsecond precision, and thus need a centralised source, hence a roundtrip to the database server.
          • I indicated a desire to avoid such roundtrip, which we mitigated by merging the timestamp read query together with the db session lock query.

This seems like a reasonable compromise. (I do still have question marks around some of the higher bullet points, and who this will ultimately serve. I'm not disagreeing, I just want to understand it better and preserve it for future reference as otherwise it will get lost. But, given we're trying to make progress, I think that could wait until after merge.)

But now we're going to also significantly alter BagOStuff::lock and IDatabase::lock to offer this very specific optimisation as new low-level primitive built-in to the signature of these core methods.

It sounds to me like many of the higher-level bullet points are ephemeral and not bound to product requirements. That is, given a bit of maintenance love, I would expect some of those to cease to exist within a year or two. Do you see these as generally useful and worthwhile in that case? Would we likely end up using these elsewhere in a way that justifies the added complexity and runtime cost to all locks everywhere? (Or even if we made it opt-in, the complexity and API addition still stand.)

I don't see the WatchedItemStore , FIlterProfiler, or EmergencyCache use of merge() going anywhere anytime soon. It seems easier to just keep it working reliably.

The current modtoken patch does not rely on microsecond precision anymore to handle write races or CAS tokens, though it does rely on second precision timestamps. It uses DB timestamps to give more reliably non-decreasing timestamps for operations with semantics that make invocations well-ordered.

The Database/BagOStuff patches use microsecond precision since it isn't really any harder to do and is strictly more useful (e.g. for making CAS tokens or unique IDs and so on).
Also, note that BagOStuff already tracked lock timestamps indirectly in order to log about unlock()ing expired locks.

I don't think the IF/SYSDATE() and extra SQL query character overhead for lock() is significant.

Change 698909 merged by jenkins-bot:

[mediawiki/core@master] rdbms: track the acquisition timestamps of named locks in Database

https://gerrit.wikimedia.org/r/698909

Change 692462 merged by jenkins-bot:

[mediawiki/core@master] objectcache: make BagOStuff::lock/unlock easier to override

https://gerrit.wikimedia.org/r/692462

Change 574101 had a related patch set uploaded (by Krinkle; author: Aaron Schulz):

[mediawiki/core@master] rdbms: allow merging IDatabase::upsert() rows with current ones

https://gerrit.wikimedia.org/r/574101

Change 574101 had a related patch set uploaded (by Krinkle; author: Aaron Schulz):

[mediawiki/core@master] rdbms: allow merging IDatabase::upsert() rows with current ones

https://gerrit.wikimedia.org/r/574101

As I understand it, this patch is motivated by wanting support for multiple rows in upsert(), just as replace() allows already, by using slightly different MySQL syntax for the upsert as we currently used syntax that could only accomodate a single row reliably.

The context is that change 666781 moves SqlBagOStuff away from replace() to using upsert() by default, thus without this additional change, it would mean things like setMulti() would become serial queries.

Krinkle raised the priority of this task from Medium to High.Aug 9 2021, 6:57 PM

Change 666781 merged by jenkins-bot:

[mediawiki/core@master] objectcache: make use of new `modtoken` field in SqlBagOStuff

https://gerrit.wikimedia.org/r/666781

gerritbot wrote:

[mediawiki/core] objectcache: make use of new `modtoken` field in SqlBagOStuff

https://gerrit.wikimedia.org/r/666781

This change has been reverted due to causing a significant ParserCache disk space increase in production, T288998.

gerritbot wrote:

Change 713364 merged by jenkins-bot:

[mediawiki/core] Revert \"objectcache: make use of new `modtoken` field in SqlBagOStuff\"

https://gerrit.wikimedia.org/r/713364

We'll need to investigate this before trying again. It might be unrelated to schema issues, but it would make things a lot easier to reason about if parsercache dbs and tables matched their defined schemas indeed. When I checked last year, some of the tables were many years out of date and also inconsistent within them (e.g. some newer segments were different from older segments).

Change 713925 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] objectcache: make use of new `modtoken` field in SqlBagOStuff (ii)

https://gerrit.wikimedia.org/r/713925

Can I ask the patch to have more tests please?

Change 713925 merged by jenkins-bot:

[mediawiki/core@master] objectcache: make use of new `modtoken` field in SqlBagOStuff (ii)

https://gerrit.wikimedia.org/r/713925

Umherirrender subscribed.

you need to create a schema change ticket for DBAs https://phabricator.wikimedia.org/tag/blocked-on-schema-change/ It's showing up in the drift reports now.

@aaron We could re-use this ticket, but either way needs to be tracked as schema change and put on the radar of DBAs.

Also, during T282761 I noticed that several previous scheme chanages have not been applied to production either. For example, the exptime column is sometimes MySQL datetime in production instead of blob. My presumably unforeseen miracle, this actually works since MariaDB is not just able to cast MW's 14-digit timestamp values to a proper datetime value with the same underlying date, it also is able to transparently compare these, so exptime > etc works as intended. But.. we probably shouldn't keep relying on that for something as critical as the parser caches.

Created T298206

In September, an hour after Aaron resolved this, I marked T272512 as subtask of T212129 to track the schema change for production.

In September, an hour after Aaron resolved this, I marked T272512 as subtask of T212129 to track the schema change for production.

Is that a blocker or the same work? It is a different ALTER TABLE but certainly can be batched.

In September, an hour after Aaron resolved this, I marked T272512 as subtask of T212129 to track the schema change for production.

That task does not mention modtoken/flags, for easier search and tracking I have created the new one. If that should be one task, feel free to merge. Sorry for the extra work.