Page MenuHomePhabricator

Notification dashboards miscount time-to-read
Closed, ResolvedPublic

Description

The distribution of response times and distribution of unread notifications dashboards miscount the time that it takes a notification to be read because they (understandably) treat the difference between two MediaWiki timestamps as the time difference in seconds. Sadly, that isn't true because MediaWiki timestamps are just integers (techically, BINARY strings) that look like timestamps.

I'm working on a fix for this; we'll have to rerun the queries for past periods as well.

Event Timeline

Change 280371 had a related patch set uploaded (by Neil P. Quinn-WMF):
Fix calculation of time-to-read

https://gerrit.wikimedia.org/r/280371

nshahquinn-wmf raised the priority of this task from High to Needs Triage.Mar 31 2016, 12:31 AM
nshahquinn-wmf moved this task from Backlog to Neil's in progress on the Contributors-Analysis board.

In addition to fixing the calculation, I also want to stop calculating the "distribution of unread notifications" graph because it's expensive to calculate but super easy to derive from the "distribution of response time" graph (it's just the 30_plus line divided by the sum of all the lines).

@jmatazzoni, do you have any objection?

Change 280371 merged by Mforns:
Fix calculation of time-to-read

https://gerrit.wikimedia.org/r/280371

Thanks @mforns!

It looks like the new data already exists on stat1003 in /a/limn-public-data/metrics/echo, but the copy of the repo in /srv/limn-ee-data doesn't have the new commits yet. Does reportupdater run on a different copy of the repo?

Also, the dashboard is still showing the old data. How can I switch it over?

@Neil_P._Quinn_WMF
reportupdater now runs on /a/reportupdater/jobs/limn-ee-data.
Regarding the old data showing in the dashboard, limn must be configured to point to the new report file. Let me look into this.

@Neil_P._Quinn_WMF

The configuration for the dashboard:
http://ee-dashboard.wmflabs.org/dashboards/enwiki-features
lives in:
https://github.com/wikimedia/limn-editor-engagement-data/blob/master/dashboards/enwiki-features.json#L27

We should replace this line, which points to the old report file:
"http://datasets.wikimedia.org/limn-public-data/metrics/echo/distribution_of_unread_notifications/enwiki.tsv?name=Distribution of unread notifications",
by this line that points to the new report file:
"http://datasets.wikimedia.org/limn-public-data/metrics/echo/days_to_read/enwiki.tsv?name=Distribution of unread notifications",

This repository is very old and is not handled by Gerrit. Do you want me to create a pull request to Github? I'm not sure also if once this gets merged it will be deployed automatically, or we need another step.

This repository is very old and is not handled by Gerrit. Do you want me to create a pull request to Github? I'm not sure also if once this gets merged it will be deployed automatically, or we need another step.

Are you sure it's not https://github.com/wikimedia/analytics-limn-ee-data ? That one is in Gerrit...

@Catrope

Are you sure it's not https://github.com/wikimedia/analytics-limn-ee-data ? That one is in Gerrit...

Yes, this one holds the queries for the reports to be generated, but the dashboard configuration is in
https://github.com/wikimedia/limn-editor-engagement-data/blob/master/dashboards/enwiki-features.json#L27

@mforns, I just created a pull request to change the dashboard.

Also, once the switchover is complete, I want to delete the old data files since they're incorrect. These files (on stat1003) are the ones I want, right?

/srv/limn-public-data/metrics/echo/distribution_of_response_time/
/srv/limn-public-data/metrics/echo/distribution_of_unread_notifications/

@Neil_P._Quinn_WMF

Yes, I can delete the files. Please, let me know when the switchover is done.

@Neil_P._Quinn_WMF

BTW, the pull request looks good to me! I can not merge it, but if you can, please go ahead.

@Neil_P._Quinn_WMF

And one more comment :P
This change T132463 has recently moved datasets.wikimedia.org to https only. So if you want to avoid the redirect, you can also change that.

@mforns, I added another pull request for that. It looks like Dan has merged both.

The dashboard now shows the new data. You can go ahead and delete the old data files.

For posterity's sake, here is the new and old data for enwiki. Suprisingly, this bug didn't make as big a difference as I expected.

@Neil_P._Quinn_WMF

Cool! I removed the files from stat1003.

For posterity's sake, here is the new and old data for enwiki. Suprisingly, this bug didn't make as big a difference as I expected.

It seems significantly different to me, the proportions may be similar, but the actual values differ quite a lot no?

It seems significantly different to me, the proportions may be similar, but the actual values differ quite a lot no?

To add some context: I guess we were looking at it through the percentage perspective since one of the recent focus of discussion was the last graph ("Notifications by days taken to read").

With the original data, we identified that for the notifications sent in a month, ~50% were read in a couple of days while ~50% remain unread at the end of the month (which could affect how pending work accumulates from month to month). With the corrected numbers the distribution seems to be ~60% of notifications read in 2 days vs. ~40% remaining unread at the end of the month.

This ~10% change was less significant that what we expected given that read notifications were ignored when they were read as part of a bundle. Which may mean that there are less bundles that what we think or that they are outnumbered by non-bundled notifications.

@Pginer-WMF, actually, for this graph, bundled notifications are counted the same as any other notification, and marked as read when the bundle is read (which is different from how I did it in T125180).

Seems done to me. Let me know if I've missed something.