Page MenuHomePhabricator

prefUpdate schema contains multiple identical events for the same preference update
Open, MediumPublic

Description

(Schema page for reference: https://meta.wikimedia.org/wiki/Schema:PrefUpdate )

I just happened upon this observation by @Halfak from February 2017:

I did some queries and this schema is obviously broken. It's common to see a "prefUpdate" show up multiple times for the exact same value with the same timestamp. So it's hard to know when a change actually took place and when it did not. So, I think I'll be declining this ticket [about using the PrefUpdate schema for a particular instrumentation for ORES].

Since the web team is currently relying on this schema (cf. T212516 , T211197), I checked if this is still an issue. There are indeed still lots of such duplicate events as of last month, although they seem to be focused on particular properties (compare also this list of the 200 most frequent properties in the schema overall):

SELECT event.property AS property, COUNT(*) AS duplicated_events
FROM (
  SELECT event, COUNT(*) AS copies
  FROM event.prefupdate 
  WHERE year = 2019 AND month = 2
  GROUP BY event
  HAVING copies > 1) AS events
GROUP BY event.property
ORDER BY property LIMIT 10000;
property	duplicated_events
cx	115
date	1
echo-subscriptions-email-article-linked	1
echo-subscriptions-email-edit-thank	1
echo-subscriptions-email-login-fail	1
echo-subscriptions-email-login-success	1
echo-subscriptions-email-mention	1
echo-subscriptions-email-oauth-owner	1
echo-subscriptions-email-page-review	1
echo-subscriptions-web-article-linked	2
echo-subscriptions-web-mention-failure	1
echo-subscriptions-web-mention-success	1
echo-subscriptions-web-reverted	1
editsectiononrightclick	1
enotifminoredits	1
enotifwatchlistpages	1
fileexporter	285
gender	1
lqt-watch-threads	32
lqtdisplaycount	266
lqtdisplaydepth	266
lqtnotifytalk	306
minordefault	68
php7	59
popups	1
prefershttps	9540
templatewizard-betafeature	3105
timecorrection	3
twocolconflict	192
uls-preferences	26
usecodemirror	6
visualeditor-betatempdisable	165
visualeditor-enable	705
visualeditor-findAndReplace-findText	13
visualeditor-findAndReplace-replaceText	6
visualeditor-newwikitext	3207
visualeditor-visualdiffpage	185
Time taken: 85.171 seconds, Fetched: 37 row(s)

Event Timeline

Tbayer moved this task from Triage to Tracking on the Product-Analytics board.

(Tagging this with Analytics considering general EL code stewardship and the current schema maintainer, although I honestly don't know who is in the best position to fix this.)

Tbayer updated the task description. (Show Details)Mar 21 2019, 12:13 AM

I used data from this schema in T216185. My experience was similar to what @Tbayer mentions in that some preferences appear to have issues with duplication, and some do not. In this case the echo-notifications-blacklist and email-blacklist preferences were affected. It also seems that the issue changed over time, in other words that some users logged lots of preference changes during certain periods, and not during others.

This schema has no owner and it's the owner who has to correct the instrumentation.
cc @jlinehan fyi

fdans moved this task from Incoming to Radar on the Analytics board.Mar 25 2019, 3:59 PM
Aklapper edited projects, added Analytics-Radar; removed Analytics.Jun 10 2020, 6:33 AM

In T249386, I used prefupdate data to determine the opt-in and opt-out rate of the discussiontools-betaenable property. While reviewing the data, I found there were 100 duplicate events recorded for this property between March 31 and June 26th.

sdkim triaged this task as Medium priority.Sep 28 2020, 3:15 PM
sdkim moved this task from Inbox to Next on the Product-Data-Infrastructure board.
sdkim added a subscriber: sdkim.Sep 28 2020, 3:28 PM

Product Infrastructure data as the new crowned owners of this schema will be reviewing and hoping to merge here soon.

mpopov added a subscriber: mpopov.Sep 28 2020, 3:32 PM
SELECT
  year, month,
  CONCAT_WS(', ', COLLECT_SET(event.property)) AS properties_affected,
  COUNT(1) AS duplicated_events
FROM (
  SELECT year, month, event, COUNT(1) AS copies
  FROM event.prefupdate 
  WHERE year = 2020
  GROUP BY year, month, event
  HAVING copies > 1
) AS events
GROUP BY year, month
ORDER BY year, month
LIMIT 1000000;
yearmonthproperties_affectedduplicated_events
20206VectorSkinVersion, betafeatures-auto-enroll, discussiontools-betaenable, echo-notifications-blacklist, email-blacklist, growthexperiments-help-panel-tog-help-panel, growthexperiments-homepage-enable, growthexperiments-homepage-pt-link, mf_amc_optin, popups, popupsreferencepreviews, skin, mfMode19
20207VectorSkinVersion, betafeatures-auto-enroll, discussiontools-betaenable, echo-notifications-blacklist, email-blacklist, growthexperiments-help-panel-tog-help-panel, growthexperiments-homepage-enable, growthexperiments-homepage-pt-link, mfMode, mf_amc_optin, popups, popupsreferencepreviews, skin23
20208VectorSkinVersion, betafeatures-auto-enroll, discussiontools-betaenable, echo-notifications-blacklist, email-blacklist, growthexperiments-help-panel-tog-help-panel, growthexperiments-homepage-enable, growthexperiments-homepage-pt-link, mfMode, mf_amc_optin, popups, popupsreferencepreviews, skin46
20209VectorSkinVersion, betafeatures-auto-enroll, discussiontools-betaenable, echo-notifications-blacklist, email-blacklist, growthexperiments-help-panel-tog-help-panel, growthexperiments-homepage-enable, growthexperiments-homepage-pt-link, mfMode, mf_amc_optin, popups, popupsreferencepreviews, skin516196

So…still an issue

I've been poking at this as time permits but haven't yet managed to track down a cause. Interestingly, eyeballing some data from yesterday, it seems to happen particularly often for popupsreferencepreviews events.

I'm curious to see whether this problem persists after the migration of this instrument to MEP is finished. (At time of writing, it's only using MEP on testwiki.) If nothing else, it'll give us some better metadata to use for investigation.

Mholloway added a comment.EditedWed, Feb 17, 5:08 PM

OK, after running a couple of queries for the past couple of days, it looks like the worst offender is in fact mf_amc_optin, followed by popupsreferencepreviews, then discussiontools-betaenable, then a handful of others with a duplicate or two per day.

For example:

SELECT event.property AS property, COUNT(*) AS duplicated_events
FROM (
  SELECT event, COUNT(*) AS copies
  FROM event.prefupdate 
  WHERE year = 2021 AND month = 2 AND day = 15
  GROUP BY event
  HAVING COUNT(*) > 1) AS events
GROUP BY event.property
ORDER BY duplicated_events DESC;

returns:

propertyduplicated_events
mf_amc_optin477
popupsreferencepreviews214
discussiontools-betaenable59
skin2
popups2
VectorSkinVersion1

The only factor I can find in common among the duplicated events is that the vast majority of them are coming from mobile user agents. (No particular platform or browser stands out, however.)

Change 665236 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/extensions/MobileFrontend@master] Fix: Save user options only once when Advanced Mode is toggled

https://gerrit.wikimedia.org/r/665236

Jdlrobson added a subscriber: Jdlrobson.

I believe a review here is required from our team.

phuedx claimed this task.Mon, Feb 22, 6:03 PM
phuedx reassigned this task from phuedx to polishdeveloper.Tue, Feb 23, 4:29 PM
phuedx added a subscriber: phuedx.