Page MenuHomePhabricator

PrefUpdate captures user preference modifications at registration
Closed, ResolvedPublic

Description

When checking what preferences are saved through PrefUdate, I noticed that the number of events saved for growthexperiments-homepage-enable was 5,285 on 2020-08-17. This seemed unusually high for an option that I'd expect to be toggled by humans.

Digging further into this, I examined one of the entires and noticed that the timestamp of the event in the Data Lake was a few seconds after the user's registration timestamp. This to me indicates that the event was saved when the system set the user's Homepage preference to 1 because the user was not in the control group.

Previously, these types of events were not saved by PrefUpdate, meaning that the Growth team's process of excluding users who turned the Homepage on or off was straightforward: if a user was logged by this schema they'd changed the preference themselves. Now, we'd instead have to figure out some heuristic to remove these system events from the data. I'd prefer if we could either not log system-generated changes like these, or add some way to identify them so they can be excluded.

Event Timeline

One question came up when discussing this task in the Growth team: are we seeing this pattern for user preferences from other extensions besides GrowthExperiments?

I wrote the following Hive query to count the number of events recorded within 15 seconds of registration for the first 15 days of August 2020:

WITH ssac AS (
    SELECT wiki, event.userid, dt
    FROM event.serversideaccountcreation
    WHERE year = 2020
    AND month = 8
    AND day <= 15
),
pu AS (
    SELECT wiki, event.userid, event.property, dt
    FROM event.prefupdate
    WHERE year = 2020
    AND month = 8
    AND day <= 15
)
SELECT pu.property, count(*) AS num_events
FROM  ssac
LEFT JOIN pu
ON ssac.wiki = pu.wiki
AND ssac.userid = pu.userid
WHERE (unix_timestamp(pu.dt, "yyyy-MM-dd'T'HH:mm:ss'Z'") -
       unix_timestamp(ssac.dt, "yyyy-MM-dd'T'HH:mm:ss'Z'") < 15)
GROUP BY pu.property
ORDER BY num_events DESC
LIMIT 250;

Looking at the output below, there are two clear groups of preferences that are affected. The first group is VectorSkinVersion, popups, mf_amc_optin, and skin, which the Reading Web team are tracking. The other are the GrowthExperiment preferences. I've removed all other parts of the output as its properties that are no longer tracked by PrefUpdate.

property	num_events
VectorSkinVersion	563176
popups	516305
mf_amc_optin	475338
skin	475209
growthexperiments-homepage-enable	42444
growthexperiments-homepage-pt-link	42444
growthexperiments-help-panel-tog-help-panel	42444

In conclusion, the GrowthExperiments properties are affected, and some of the Reading Web team's properties are affected.

This appears to have been caused by rEWMV30731c2c748a: PrefUpdate: Add property tracking filter, which removed the PrefUpdateInstrumentation::isKnownSettingsPage test prior to sending the PrefUpdate events. That test would pass if the current request was to a known settings page or the options API, which was being used as a proxy for whether the user was updating their preferences themselves. PrefUpdate events logged in that case are no longer distinguishable from those logged due to calls to User::saveSettings on the server-side.

My recommendation would be to reinstate that test but perhaps give it a more meaningful name so that its function is well understood, e.g. PrefUpdateInstrumentation::isUserInitiated.

/cc @Krinkle

Milimetric subscribed.

this is just instrumentation work, right?

this is just instrumentation work, right?

That's correct.

Change 622172 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[mediawiki/extensions/WikimediaEvents@master] PrefUpdate: Only log if the preference update is user-initiated

https://gerrit.wikimedia.org/r/622172

Change 622172 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] PrefUpdate: Only log if the preference update is user-initiated

https://gerrit.wikimedia.org/r/622172

sdkim claimed this task.