Page MenuHomePhabricator

prefUpdate schema contains multiple identical events for the same preference update
Open, Needs TriagePublic

Description

(Schema page for reference: https://meta.wikimedia.org/wiki/Schema:PrefUpdate )

I just happened upon this observation by @Halfak from February 2017:

I did some queries and this schema is obviously broken. It's common to see a "prefUpdate" show up multiple times for the exact same value with the same timestamp. So it's hard to know when a change actually took place and when it did not. So, I think I'll be declining this ticket [about using the PrefUpdate schema for a particular instrumentation for ORES].

Since the web team is currently relying on this schema (cf. T212516 , T211197), I checked if this is still an issue. There are indeed still lots of such duplicate events as of last month, although they seem to be focused on particular properties (compare also this list of the 200 most frequent properties in the schema overall):

SELECT event.property AS property, COUNT(*) AS duplicated_events
FROM (
  SELECT event, COUNT(*) AS copies
  FROM event.prefupdate 
  WHERE year = 2019 AND month = 2
  GROUP BY event
  HAVING copies > 1) AS events
GROUP BY event.property
ORDER BY property LIMIT 10000;
property	duplicated_events
cx	115
date	1
echo-subscriptions-email-article-linked	1
echo-subscriptions-email-edit-thank	1
echo-subscriptions-email-login-fail	1
echo-subscriptions-email-login-success	1
echo-subscriptions-email-mention	1
echo-subscriptions-email-oauth-owner	1
echo-subscriptions-email-page-review	1
echo-subscriptions-web-article-linked	2
echo-subscriptions-web-mention-failure	1
echo-subscriptions-web-mention-success	1
echo-subscriptions-web-reverted	1
editsectiononrightclick	1
enotifminoredits	1
enotifwatchlistpages	1
fileexporter	285
gender	1
lqt-watch-threads	32
lqtdisplaycount	266
lqtdisplaydepth	266
lqtnotifytalk	306
minordefault	68
php7	59
popups	1
prefershttps	9540
templatewizard-betafeature	3105
timecorrection	3
twocolconflict	192
uls-preferences	26
usecodemirror	6
visualeditor-betatempdisable	165
visualeditor-enable	705
visualeditor-findAndReplace-findText	13
visualeditor-findAndReplace-replaceText	6
visualeditor-newwikitext	3207
visualeditor-visualdiffpage	185
Time taken: 85.171 seconds, Fetched: 37 row(s)

Event Timeline

Tbayer moved this task from Triage to Tracking on the Product-Analytics board.

(Tagging this with Analytics considering general EL code stewardship and the current schema maintainer, although I honestly don't know who is in the best position to fix this.)

Tbayer updated the task description. (Show Details)Mar 21 2019, 12:13 AM

I used data from this schema in T216185. My experience was similar to what @Tbayer mentions in that some preferences appear to have issues with duplication, and some do not. In this case the echo-notifications-blacklist and email-blacklist preferences were affected. It also seems that the issue changed over time, in other words that some users logged lots of preference changes during certain periods, and not during others.

This schema has no owner and it's the owner who has to correct the instrumentation.
cc @jlinehan fyi

fdans moved this task from Incoming to Radar on the Analytics board.Mar 25 2019, 3:59 PM