Page MenuHomePhabricator

Move user_editcount into its own table, and shard it
Open, Needs TriagePublic

Description

Why

  • reduce database lock contention, which would reduce site outages and performance issues
  • requested by @Ladsgroup in T365303#11119098

The next bottleneck here is actually update user_editcount which has the same problem (deferred update, on the same row). I want to eventually move that column out of user table which would also to shard them, e.g. each user would have ten rows being picked at random so it they wouldn't compete on lock for same row. If anyone is feeling like doing it, that'll help a lot in both commons and wikidata.

What

  • split user.user_editcount into a new table called [TBD]
  • the columns will be [TBD]
  • users with zero edits should have zero rows in the table. no row = assume the user has 0 edits
  • this table should use sharding. this reduces lock contention.

Event Timeline

we eventually want to delete user_editcount entirely

I believe the answer is yes, since a legacy column just clutters user table and make users (querying the table) confused. This will also silently break (i.e. causes sneaky issues) extensions that do not know the new table.

sharding

If it is needed then please do so on the existing global_edit_count table first. Now I oppose sharding, see below.

Please also note unlike initSiteStats.php, there are no reliable way to recover the edit count if it is lost (e.g. initEditCount.php currently do not count deleted edits).

Thanks for filing this ticket with a lot more details!!

the sharding should behind a feature flag

sharding should only be enabled in a few very large wikis, since this sacrifice the performance of querying.

Sharding a column by having 10 rows in the place of 1 seems like a ugly workaround rather than a clean architectural solution. Among other things, it makes the table confusing to query (esp. for users of wiki replicas who can't just use a getEditCount() method that core could provide).

Updates to user_editcount aren't time-critical, so we could instead buffer the updates by sending them to redis first and have workers periodically aggregate the deltas and flush to the db. This would also make the chances of row lock contention zero, instead of just reducing it 10x.

Updates to user_editcount aren't time-critical

This would be true for old users and users with many edits, but for users with few edits it is used to determine autoconfirmed and autopromotion. A lag for no more than, say, 10s is required for them.

Note for those users we do not need sharding either.

it makes the table confusing to query

This should be taken as another reason we should drop user_editcount once we have a table.

Sharding a column by having 10 rows in the place of 1 seems like a ugly workaround rather than a clean architectural solution. Among other things, it makes the table confusing to query (esp. for users of wiki replicas who can't just use a getEditCount() method that core could provide).

It is a very common practice in the industry. I actually think it's quite smart and elegant.

Updates to user_editcount aren't time-critical, so we could instead buffer the updates by sending them to redis first and have workers periodically aggregate the deltas and flush to the db. This would also make the chances of row lock contention zero, instead of just reducing it 10x.

We don't have redis for such usecases and we can't set up for this. We could put them in memcached but the data there is not guaranteed and might get purged without being flushed on disk.

It is a very common practice in the industry. I actually think it's quite smart and elegant.

Fair. I was unaware of the pattern, though I still prefer the buffered updates to avoid compromising schema purity.

We don't have redis for such usecases and we can't set up for this.

We seem to have redis in prod, what's wrong with reusing the same cluster?

I had most of the below written out in a draft comment, so posting anyway (free knowledge for those of you new to the wonders of redis!)

Approach:

  • Use a redis hash to store deltas of edit counts. Just after a flush occurs, this is empty.
  • Every time a user edits, HINCRBY editcount-deltas <user_id> 1 increments the delta. HINCRBY is an atomic operation; no write-after-read is needed.
  • Every 10-20 seconds, flush the deltas:
    • HKEYS editcount-deltas -> gets all user ids
    • Use HGETDEL to get the count for each user id, simultaneously deleting it, and update the db. The atomic read+deletion ensures that if the user edits while this is in progress, all the edits are accounted for (in either the current flush or the next once).
      • If the db update fails, use HINCRBY to reinsert to redis for retry in the next flush.

It is a very common practice in the industry. I actually think it's quite smart and elegant.

Fair. I was unaware of the pattern, though I still prefer the buffered updates to avoid compromising schema purity.

We don't have redis for such usecases and we can't set up for this.

We seem to have redis in prod, what's wrong with reusing the same cluster?

I am aware of redis in our infra. I maintained redis backend of ores for many years (https://wikitech.wikimedia.org/w/index.php?title=Redis&oldid=1861959), Switched some part of wikibase code to use redis (T159826: Use redis-based lock manager in dispatch changes in production) and even last month, I did some stuff with redis too: T386640#10873880

The problem is that we are actively avoiding redis. The current set up in production is just for lock management of mediawiki (RedisLockManager) and I think there is desire to migrate off redis for those too and basically shut it down. Main reason is to reduce the complexity of the infra and moving parts and adding more to redis goes in the opposite direction. i.e. what I mean is that you need to convince serviceops team to allow more stuff to be added to redis.

Is there reason why to use row splitting instead of separate tables for deltas which would be aggregated to main value periodically? With delta aggregation i mean something like below and row splitting is described in task description.

Delta table definition

CREATE TABLE user_editcount_delta (
    uecd_id bigint(20) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
    uecd_user_id int(10) unsigned NOT NULL,
    uecd_delta TINYINT DEFAULT 0,
    uecd_batch_token VARBINARY(32) NULL,
    
    INDEX idx_user_batch (uecd_user_id, uecd_batch_token),  
    INDEX idx_batch_token (uecd_batch_token)  
);

Adding new edits

INSERT INTO user_editcount_delta (uecd_user_id, uecd_delta) 
VALUES ($user_id, 1);

Querying exact edit count for user

SELECT u.user_editcount + COALESCE(SUM(d.uecd_delta), 0) AS exact_edit_count
FROM user AS u
LEFT JOIN user_editcount_delta AS d 
    ON u.user_id = d.uecd_user_id 
    AND d.uecd_batch_token IS NULL  -- only sum unapplied deltas
WHERE u.user_id = $user_id
GROUP BY u.user_id;

Aggregating delta values to the user table

SET @token = UNHEX(REPLACE(UUID(),'-',''));

START TRANSACTION;

-- Mark deltas for processing
UPDATE user_editcount_delta
SET uecd_batch_token = @token
WHERE uecd_batch_token IS NULL
ORDER BY uecd_id  -- Changed from 'id' to 'uecd_id'
LIMIT 50000;

-- Apply aggregated deltas to user table
UPDATE user AS u
JOIN (
    SELECT
        uecd_user_id,
        SUM(uecd_delta) AS delta  -- Removed trailing comma
    FROM user_editcount_delta
    WHERE uecd_batch_token = @token
    GROUP BY uecd_user_id
) AS deltas ON u.user_id = deltas.uecd_user_id
SET u.user_editcount = u.user_editcount + deltas.delta;

-- Clean up processed deltas
DELETE FROM user_editcount_delta
WHERE uecd_batch_token = @token;

COMMIT;

Is there reason why to use row splitting instead of separate tables for deltas which would be aggregated to main value periodically

Answer to myself.

Row splitting is better than delta+aggregation because it is simpler to implement and maintain in long run.

The con would be that queries which are related to sorting all users require aggregating all the data in the table before sorting where as delta+aggregation can use index and has less data to aggregate.

MediaWiki allows user table be shared across wikis, so even if we created a general counter table (T402854) we can not use it since it can not be meaningfully shared. For now if multiple wikis have a shared user table, they will also have a shared user edit count. When we split them to new tables, we can let sysadmins choose whether sharing the edit count or not.

Update: We theorically can use the general counter table, but it requires a virtual database domain; which is not how shared database currently work.