Page MenuHomePhabricator

Measure the notification types that are most abundantly received at 5 sample wikis
Closed, ResolvedPublic

Description

In order to better understand Echo usage, and what type of notifications various editors are accustomed to receiving, we'd like to get:

  • A list of the 50 (?) users who received the most notifications, over the last 90 (?) days
  • with numbers given for each specific notification-type that they recieved, plus the total.
  • (One each for Enwiki, Meta, Commons, Frwiki, Hewiki (?) - 5 examples chosen to give a variety of scales, and because certain extensions that use Echo are only actively used at some wikis.)

e.g. 5 tables, with 50 rows each, something like this:

---------------------------------------------------------------------------
| username | total notifications | mention | edit-user-talk | page-linked | (...)
---------------------------------------------------------------------------
| alice    |         987         |   98    |      42        |     2       | (...)
---------------------------------------------------------------------------
| bob      |         654         |   76    |      23        |     19       | (...)
---------------------------------------------------------------------------
...

The complete list of notification types (potentially on these wikis) that we'd like to measure is:

edit-user-talk, reverted, page-linked, mention, user-rights, pagetriage-mark-as-reviewed, pagetriage-add-maintenance-tag, pagetriage-add-deletion-tag, ep-course-talk-notification, edit-thank, flow-thank, flow-new-topic, flow-post-reply, flow-post-edited, flow-topic-renamed, flow-mention, gather-hide, gather-unhide, gather-approve,


See also:

Event Timeline

Quiddity raised the priority of this task from to Needs Triage.
Quiddity updated the task description. (Show Details)
Quiddity added a project: Notifications.
Quiddity added a subscriber: Quiddity.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Notifications are private, so I don't think we should be associating users with what notifications they are receiving.

Notifications are private, so I don't think we should be associating users with what notifications they are receiving.

We couldn't publish this data, but if people (ping @Pginer-WMF) think this is useful information, we could still get it and give it to them.

Catrope set Security to None.

Getting a picture of how are things for the 50 most busy users seems interesting to have a better idea of that end of the spectrum (I'd add how many of those notifications are read).

I understand the current ticker as a one-time data capture to learn from, as opposed to a periodic source of information to help us measure our improvements (but I may be missing something). Although the above is definitely interesting I'd start by these other measurements:

The reason is that they gave us a broader picture (1st one) and a goal to optimise (2nd one). In addition to represent anonymously aggregated data we can share easier.

@Quiddity, @Pginer-WMF, do you need user names? If I leave that out, that will reduce the sensitivity of the data.

In T113664#1714000, @Neil_P._Quinn_WMF wrote:

@Quiddity, @Pginer-WMF, do you need user names? If I leave that out, that will reduce the sensitivity of the data.

I'm totally ok with anonymised data. Since @Quiddity created the ticker, he may have more details on this.

Anonymized data shouldn't be blocked on the analytics-store task, should it? You wouldn't have to join against the user table AFAICT

Good point. Since it's not actually blocked, I should be able to work on it tomorrow.

Lovely, rather brute force SQL:

SELECT 
   COUNT(*) as "total notifications",
   SUM( IF( event_type = "edit-user-talk", 1, 0 ) ) as "edit-user-talk",
   SUM( IF( event_type = "reverted", 1, 0 ) ) as "reverted",
   SUM( IF( event_type = "page-linked", 1, 0 ) ) as "page-linked",
   SUM( IF( event_type = "mention", 1, 0 ) ) as "mention",
   SUM( IF( event_type = "user-rights", 1, 0 ) ) as "user-rights",
   SUM( IF( event_type = "pagetriage-mark-as-reviewed", 1, 0 ) ) as "pagetriage-mark-as-reviewed",
   SUM( IF( event_type = "pagetriage-add-maintenance-tag", 1, 0 ) ) as "pagetriage-add-maintenance-tag",
   SUM( IF( event_type = "pagetriage-add-deletion-tag", 1, 0 ) ) as "pagetriage-add-deletion-tag",
   SUM( IF( event_type = "ep-course-talk-notification", 1, 0 ) ) as "ep-course-talk-notification",
   SUM( IF( event_type = "edit-thank", 1, 0 ) ) as "edit-thank",
   SUM( IF( event_type = "flow-thank", 1, 0 ) ) as "flow-thank",
   SUM( IF( event_type = "flow-new-topic", 1, 0 ) ) as "flow-new-topic",
   SUM( IF( event_type = "flow-post-reply", 1, 0 ) ) as "flow-post-reply",
   SUM( IF( event_type = "flow-post-edited", 1, 0 ) ) as "flow-post-edited",
   SUM( IF( event_type = "flow-topic-renamed", 1, 0 ) ) as "flow-topic-renamed",
   SUM( IF( event_type = "flow-mention", 1, 0 ) ) as "flow-mention",
   SUM( IF( event_type = "gather-hide", 1, 0 ) ) as "gather-hide",
   SUM( IF( event_type = "gather-unhide", 1, 0 ) ) as "gather-unhide",
   SUM( IF( event_type = "gather-approve", 1, 0 ) ) as "gather-approve"
FROM echo_event
INNER JOIN echo_notification
ON event_id = notification_event
WHERE 
   notification_timestamp >= "20150701000000" AND
   notification_timestamp < "20151001000000"
GROUP BY notification_user
ORDER BY `total notifications` DESC
LIMIT 50;

Results:

[contains no identifying or private user data]

Initial impressions:

  • page-linked is the most common at every wiki except frwiki, where mention dominates
  • edit-thank is common at enwiki but almost nonexistent at the other four
  • There was only one mention among all 50 users at Commons

I'm wondering whether bots account for any of these results; for some reason Meta's echo tables are on analytics-store, so for it (but not for the others) I was able to look for bots. I didn't find any in the top 50 (the most notified bot at Meta had 5 notifications in the 3 months I'm looking at). However, bots are likely to be a lot more active at the other 4 wikis, so I'd be interested in repeating this for them when T115275 is finished.

Bot checking on Meta:

SELECT
   COUNT(*) as "total notifications"
FROM echo_notification
LEFT JOIN user_groups
ON ug_user = notification_user
WHERE 
   notification_timestamp >= "20150701000000" AND
   notification_timestamp < "20151001000000" AND
   ug_group = "bot"
GROUP BY notification_user
ORDER BY `total notifications` DESC
LIMIT 100;

total notifications
5
2
2
1
1
1
1

Let me know if there's more analysis or exploration you'd like me to do.

for some reason Meta's echo tables are on analytics-store

There's some <s>hysterical</s> historical reasons for that, metawiki, labswiki, mediawikiwiki, and officewiki also have local echo_event tables that aren't on extension1 because they had Echo deployed early IIRC.

Results:

[contains no identifying or private user data]

Pasted into https://docs.google.com/spreadsheets/d/1qug9B85m06FnhDMpP5JXqgui77NcAro5XW8hJAU_qDI/edit#gid=579103758 for easier browsing

nshahquinn-wmf raised the priority of this task from High to Needs Triage.Mar 30 2018, 10:14 AM
nshahquinn-wmf moved this task from Neil's in progress to Done on the Contributors-Analysis board.