Page MenuHomePhabricator

Create database table to cache data about mentees
Closed, ResolvedPublic

Description

Background

The Growth team is soon going to work on the mentor dashboard, see T278920: Mentor dashboard: V1 desktop. We will want to show various information about mentees, as part of T278971: Mentor dashboard: M1 mentee overview module and T279139: Mentor dashboard: M5 impact module.

Since the numbers we will take some time to generate, we will want to cache them in a database table, and populate it with a maintenance script.

Objective

Design the database table, and merge a patch that creates it.

Additional information

Answers questions from https://wikitech.wikimedia.org/wiki/Creating_new_tables:

  • Should this table be replicated to wiki replicas (does it contain private data)?: No, it should be private.
  • Size of the table (number of rows expected): ~1500 rows per 120 newly created accounts per month; the numbers of rows will be stable
  • Expected growth per year: The table should not growth, unless the number of newcomers for a wiki changes substantially.
  • Expected amount of queries: writes will happen only from mwmaint1002 via a caching MW script (frequency was not decided yet; maybe daily? The script will likely touch most of the rows); the number of reads will be small, it will be used by a special page that will likely be only used by a couple of experienced wikimedians
  • The release plan for the feature: not sure, wikis listed in https://github.com/wikimedia/operations-mediawiki-config/blob/master/dblists/growthexperiments.dblist will get the features once ready. We might want to test them first at a subset of those wikis, that was not decided yet.

Event Timeline

Change 677653 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/puppet@production] Add growthexperiments_mentee_data to private tables

https://gerrit.wikimedia.org/r/677653

@Tgr Hello, could you review the table design and this task please? Thanks!

Change 677655 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@master] Create database table to store data about mentees

https://gerrit.wikimedia.org/r/677655

As this is a cache table with discardable content, we hope to do all the searching and sorting on the client side, and our understanding of what is and isn't important data for this table will no doubt be shaped by initial experiences and user feedback, maybe just go with something unstructured, such as a mentee ID -> JSON blob lookup table? That will make changes to the data dimensions much less painful.

As this is a cache table with discardable content, we hope to do all the searching and sorting on the client side, and our understanding of what is and isn't important data for this table will no doubt be shaped by initial experiences and user feedback, maybe just go with something unstructured, such as a mentee ID -> JSON blob lookup table? That will make changes to the data dimensions much less painful.

Good point. @Marostegui What would be your opinion on this?

I was going to ask what @Tgr actually explained, so thanks for that :-)
Just some more questions:

  • What would happen if this table isn't writable for any reason? (ie: host down, host under maintenance...).
  • What would happen if the script doesn't run for a day or a week? (ie: broken for whatever reason, mwmaint host down?)
  • What would happen if the script gets killed in the middle of a run?
  • Is this a table per wiki or a table containing all the wikis and living in x1?
Marostegui triaged this task as Medium priority.Apr 8 2021, 5:52 AM
Marostegui moved this task from Triage to Refine on the DBA board.

I was going to ask what @Tgr actually explained, so thanks for that :-)

Okay, good to know :).

Just some more questions:

  • What would happen if this table isn't writable for any reason? (ie: host down, host under maintenance...).

If the table only isn't writeable, the script simply won't update the cached data, and the frontend will reuse the old one. Users will experience a delay in updating the data, but that isn't a concern.

  • What would happen if the script doesn't run for a day or a week? (ie: broken for whatever reason, mwmaint host down?)

Same as above. A breakage taking days or weeks would definitely be a noticeable issue. But, nothing would terribly break (no exceptions thrown, etc.), it would just not update.

  • What would happen if the script gets killed in the middle of a run?

It will update the cache only for some of the mentees. Mentors (consumers of the data about their mentees) will know the feature heavily caches the data, and the only thing they'll notice is that the data is out-of-date partially. Unless the script gets killed regularly (like on every run), it shouldn't be an issue.

  • Is this a table per wiki or a table containing all the wikis and living in x1?

It should be a per-wiki table living in x1 cluster (ie. in the per-wiki databases, not wikishared), unless you prefer to put that table elsewhere, of course.

Thanks for the answers. They look good to me. x1 should be fine as this doesn't look like a table that will have much load (neither writes or reads).
As for next steps, if you can come up with a draft of the schema and some of the read queries it will be receiving?
If we use BLOB and you'd need an index on that column, we'd need to specify how many characters we'd be indexing.

Sure, here you go:

CREATE TABLE /*_*/growthexperiments_mentee_data (
  mentee_id INT UNSIGNED NOT NULL,
  mentee_data BLOB NOT NULL,
  PRIMARY KEY(mentee_id)
) /*$wgDBTableOptions*/;

I don't need any other index besides the primary key. Example of read query: SELECT mentor_id, mentor_data FROM growthexperiments_mentee_data WHERE mentee_id IN (123, 342, 965, ...). The JSON blob will be processed by application-level logic.

Does that look good to you?\

Change 678312 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/WikimediaMaintenance@master] GrowthExperiments: Create growthexperiments_mentee_data DB table

https://gerrit.wikimedia.org/r/678312

Change 677655 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Create database table to store data about mentees

https://gerrit.wikimedia.org/r/677655

Change 678312 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMaintenance@master] GrowthExperiments: Create growthexperiments_mentee_data DB table

https://gerrit.wikimedia.org/r/678312

Sure, here you go:

CREATE TABLE /*_*/growthexperiments_mentee_data (
  mentee_id INT UNSIGNED NOT NULL,
  mentee_data BLOB NOT NULL,
  PRIMARY KEY(mentee_id)
) /*$wgDBTableOptions*/;

I don't need any other index besides the primary key. Example of read query: SELECT mentor_id, mentor_data FROM growthexperiments_mentee_data WHERE mentor_id IN (123, 342, 965, ...). The JSON blob will be processed by application-level logic.

Does that look good to you?\

This looks good. In some rare cases we've seen mariadb optimizer going a bit crazy when doing massive selects with IN clause (especially with big tables, but it doesn't look like this table will be reaching crazy amount of rows).

Change 677653 merged by Marostegui:

[operations/puppet@production] Add growthexperiments_mentee_data to private tables

https://gerrit.wikimedia.org/r/677653

Mentioned in SAL (#wikimedia-operations) [2021-04-13T08:16:37Z] <marostegui> Restart sanitarium hosts db1124, db1125, db1154, db1155, db2094, db2095 T279587

All sanitarium hosts have been restarted

Mentioned in SAL (#wikimedia-operations) [2021-04-20T23:28:51Z] <Urbanecm> [urbanecm@mwmaint1002 ~]$ foreachwikiindblist growthexperiments sql.php --cluster=extension1 /srv/mediawiki/php-1.37.0-wmf.1/extensions/GrowthExperiments/maintenance/schemas/mysql/growthexperiments_mentee_data.sql # T279587

Table should be live, nothing else needed.