Page MenuHomePhabricator

API maxlag stats
Open, NormalPublic

Description

It would be good to have maxlag parameter values and rejection rates in statsd, to make it more obvious what is going on when there is replication lag, as in T198049.

About 10% of all API requests contain a maxlag parameter, so the rate would be substantial. Most of these are due to client libraries sending maxlag=5 by default. It's not really necessary for server protection to use maxlag for read actions, but I'm not sure if it's worth trying to change the policy.

We could break it down by module, say api.<module>.maxlag.accepted and api.<module>.maxlag.rejected, that way we could correlate the action=edit statistics with any observed drop in edit rate. We could send the maxlag parameter value as the statsd value so that we can predict what the rejection rate will be for any given replication lag.

Event Timeline

tstarling created this task.Aug 8 2018, 2:07 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 8 2018, 2:07 AM

Note that since rMWde9f9bda7db9: API: Optionally include in job queue size in maxlag and rMW5a7e9ba954c9: Introduce ApiMaxLagInfo hook it's possible for the maxlag value as reported by the API to be pseudoseconds, which factor in other metrics instead of just database replication lag

It's not really necessary for server protection to use maxlag for read actions, but I'm not sure if it's worth trying to change the policy.

I think it's more foolproof for framework authors to just send maxlag=5 with every request, instead of having to determine which modules might have write side-effects (yes, they could use the paraminfo API but then it's another API request, and complexity).

If we don't care about maxlag for read modules, then we should consider ignoring the parameter for those modules.

CCicalese_WMF triaged this task as Normal priority.Aug 14 2018, 2:00 AM
CCicalese_WMF moved this task from Inbox to CPT TEC1 Backlog on the Core-Platform-Team-Old board.

I don't see any particular reason to ignore maxlag for read modules. If the read is setting up for a write, it could be considered nice that the client gets the error at the start instead of preparing the whole edit only to find it fails in the end.