Page MenuHomePhabricator

Welcome Survey: improve survey aggregation performance
Closed, ResolvedPublic

Description

With T290582, the Growth features are on most wikis. The Welcome Survey aggregation notebook designed for T275172 was done during a time when we were on a handful of wikis and as a result queries ServerSideAccountCreation once for every wiki. While the cluster is fast and data gets cached, this is unnecessary and slow.

Improve the aggregation performance by querying ServerSideAccountCreation once at startup, then reuse that dataset when iterating through the wikis.

Event Timeline

Tgr subscribed.

@nettrom_WMF I'm moving this to Triaged for now, please feel free to move it to a more appropriate column/board if there is one.

The aggregation notebook has been updated to cache the specific subset of ServerSideAccountCreation that we're querying repeatedly, and then query said cached subset when iterating over each language edition. Initial testing showed the first iteration takes about the time one would expect, while subsequent queries returns data in a few seconds. Now waiting for the early December aggregation to see if everything's working as it should.

I found a bug in the aggregation notebook when I checked the log for the November aggregation. That bug has been fixed and the notebook run from start to finish to verify that it works as it should.

I'll of course check the logs after the December aggregation and make any changes necessary, but for now this works and it's time to close this task as resolved.