Page MenuHomePhabricator

Finalize and create api_feature_usage table on x1
Open, Stalled, LowPublic

Description

The api_feature_usage table for the ApiFeatureUsage extension was proposed in https://gerrit.wikimedia.org/r/1020385 as part of T313731. This is a single global table that covers all wikis. Usage of this table is behind a feature flag (changing $wgApiFeatureUsageQueryEngineConf). The default is to keep using the cirrus search engine.

Should this table be replicated to wiki replicas (does it not contain private data)?

No, in order to reduce risk of needing to suppress entries due requests with "User-Agent: <doxxing info>". The public interfaces (api, special page) require specifying an agent when enumerating entries. However, matching the current elastic engine behavior, it is only a *prefix* search...that should probably be changed to exact match, to avoid having a prefix becoming a "distribution channel". That might be a slightly annoying to people maintaining multiple bots when checking if any use deprecated features.

Will you be doing cross-joins with the wiki metadata?

No

Size of the table (number of rows expected).

~6 million, assuming 62,000 rows for each day of retention and a 90 day retention period

Expected growth per year (number of rows).

Little continuous growth is expected. However, the size can increase or decrease depending on how how many api features are deprecated at once and how popular they are.

Expected amount of queries, both writes and reads (per minute, per hour...per day, any of those are ok).

Reads will be negligible. Writes should be less than 27 queries/sec, assuming the increments over a day (e.g. 27 million) spread out over the (agent,feature) counters (e.g. 64000). This is rounding up 421 counters hits/day per agent to 500. In reality, many of those tuples will only get the initial increment/init (e.g. 32K out of 64K), meaning that adaptive sampling would reduce the write rate further.

Going into a bit more detail: for each (agent,feature) counter, it takes ~25 writes for the first 500 hits given the sampling approach. The first 5K hits take ~50 writes, and the first 50K take ~100 writes.

Examples of queries that will be using the table.

SELECT afu_date FROM `api_feature_usage` WHERE (afu_date >= '1') ORDER BY afu_date ASC LIMIT 1
SELECT afu_date,afu_feature,SUM(afu_hits) AS `hits` FROM `api_feature_usage` WHERE (afu_agent LIKE 'testing-bot%' ESCAPE '`') AND (afu_date >= '1') GROUP BY afu_date,afu_feature ORDER BY afu_date ASC,afu_feature ASC

The release plan for the feature (are there specific wikis you'd like to test first etc).

Deploy to (testwiki, test2wiki, mediawikiwiki) first. Observe table rows and logs. Use ab to verify that spikes of API queries, from one agent, using a deprecated feature, do not affect mariadb com_xxx graphs. Deploy to commonswiki after a few days, wikidatawiki a week later, then everywhere a week later.

Event Timeline

ABran-WMF added a subscriber: Ladsgroup.

Thank you for tagging us. We were not notified about this request. I will get to it ASAP.

One question is whether I should just add an autoincrement ID. It would make the DELETE queries for expiration simpler. Originally, I wanted to keep the schema support both regular and circular replication. Given the operational fun involved with the later, we could probably just bake in the non-circular replication assumption in exchange for simple IN() clause deletion instead of using factorConds().

Ladsgroup changed the task status from Open to Stalled.Oct 22 2024, 8:17 PM

This have to wait until I'm back from some travel unfortunately.

I haven't reviewed the code in depth but the values provided here look good to me. My notes are:

  • My biggest concern is around bugs or unexpected issues which would explode after deployment. I suggest first enabling in beta cluster and heavily testing it before moving forward to testwikis in production.
  • Please add it to the tables catalog (https://github.com/wikimedia/operations-puppet/blob/production/modules/mediawiki/files/mariadb/tables-catalog.yaml)
  • Please add it to the list of private tables in puppet and we need to reload sanitarium hosts to make sure it doesn't accidentally end up in wikireplicas. Even though we currently don't have any way to replicate from x1, we might set it up later or this might get accidentally created in core databases.