Page MenuHomePhabricator

Delay rollout of constraint violation gadget to all users?
Closed, DeclinedPublic

Description

Per T184069: Increase reach of constraint violation gadget, we were planning to start enabling the constraint violation gadget to all logged-in users, by enabling it for all users whose username starts with Z on March 1st for a start. But now that caching has been delayed yet again (see T184812), we might have to postpone this.

According to Special:GadgetUsage, checkConstraints is currently used by 238 active users. And according to this query, there are 235 active users (have edited in the last 30 days) whose user name starts with Z. So it looks like if we don’t enable caching, we can expect roughly twice the load from wbcheckconstraints API requests for the first week of the rollout. I feel like this could still be okay, as long as we get caching in place before Y (229) and X (114) join the party a week later. But on the other hand, if there’s another problem with caching, pausing the rollout while it’s in progress is probably worse than just delaying it as a whole…

Thoughts?

Event Timeline

We’ll try to get caching re-enabled today, and then hopefully we won’t have to delay the rollout.

So far I’m not seeing any increase in wbcheckconstraints requests (Grafana)… I guess users who go through the effort to enable the gadget are generally more active than arbitrarily selected users whose user names happen to begin with Z?

Usage has started to pick up slightly with the second batch (Grafana). We’ll try to enable caching again later today, so hopefully we won’t need to delay the rollout any further.

(For the record – I realize now that setting up this automatic rollout was irresponsible, and we should’ve instead done it via one config change per batch, so that we’re not sitting on a ticking bomb as the rollout continues automatically. The way this rollout is set up leaves no record in the Server Admin Log that the increase in requests could be connected to. Sorry.)

Okay, no further caching problems reported so far. Cache hit rate to date is somewhere between 20% and 30%, but I have reason to hope this will improve in the future (the cache has been enabled for less than one full TTL (one day), so it’s still warming up; higher gadget usage should hopefully mean more requests to the same entities; and sharing the same cache across languages should help a lot as well once we’re ready for that).