Page MenuHomePhabricator

F10. Investigate Limn dashboard for Flow, make sure it's giving us the data that we think it is
Closed, ResolvedPublic5 Story Points

Description

We count all the unique users that have done any "write" action. For example:

  • posted a new topic
  • replied to a topic or a post
  • edited a topic or a post
  • edited a board description
  • edited a topic summary
  • done any moderation action

Details

Related Gerrit Patches:
analytics/limn-flow-data : masterFix unique-users query
analytics/limn-flow-data : masterInclude all contributing users in 'unique-users' query

Event Timeline

DannyH created this task.Jul 22 2015, 6:16 PM
DannyH raised the priority of this task from to High.
DannyH updated the task description. (Show Details)
DannyH moved this task to Ready for next sprint on the Collaboration-Team-Triage board.
DannyH added a subscriber: DannyH.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 22 2015, 6:16 PM
Catrope assigned this task to SBisson.Jul 22 2015, 6:51 PM
Catrope set Security to None.
SBisson updated the task description. (Show Details)Jul 22 2015, 8:53 PM
SBisson updated the task description. (Show Details)
DannyH renamed this task from Investigate Limn dashboard for Flow, make sure it's giving us the data that we think it is to F10. Investigate Limn dashboard for Flow, make sure it's giving us the data that we think it is.Jul 23 2015, 12:42 AM

Change 226524 had a related patch set uploaded (by Sbisson):
Include all contributing users in 'unique-users' query

https://gerrit.wikimedia.org/r/226524

SBisson updated the task description. (Show Details)Jul 23 2015, 1:03 PM

Change 226524 merged by Milimetric:
Include all contributing users in 'unique-users' query

https://gerrit.wikimedia.org/r/226524

Etonkovidova added a subscriber: Etonkovidova.EditedJul 24 2015, 1:05 AM

Two questions:

  1. All below actions are captured? Or the actions that taken into count are different?
MariaDB [testwiki]> select distinct rev_change_type from flow_revision;
+----------------------+
| rev_change_type      |
+----------------------+
| reply                |
| new-post             |
| create-header        |
| edit-header          |
| edit-post            |
| edit-title           |
| lock-topic           |
| create-topic-summary |
| restore-topic        |
| edit-topic-summary   |
| hide-topic           |
+----------------------+
11 rows in set (0.00 sec)
  1. Since there is a derived table, no index is used - is it ok?
MariaDB [enwiki]> explain  SELECT weekstart,  SUM(user) as unique_users   
FROM (  SELECT 1 as user,  DATE(DATE_SUB(
FROM_UNIXTIME((conv(substring(hex(rev_id),1,12),16,10)>>2)/1000), interval DAYOFWEEK( 
FROM_UNIXTIME((conv(substring(hex(rev_id),1,12),16,10)>>2)/1000)) day)) as weekstart
FROM flow_revision  
WHERE rev_user_wiki NOT IN ( 'testwiki', 'test2wiki' )  
GROUP BY rev_user_wiki, rev_user_id, rev_user_ip ) x  
WHERE weekstart < DATE_SUB(CURDATE(), interval DAYOFWEEK(NOW()) day)  
GROUP BY weekstart\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 206376
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: flow_revision
         type: index
possible_keys: NULL
          key: flow_revision_user
      key_len: 116
          ref: NULL
         rows: 206376
        Extra: Using where; Using index; Using temporary; Using filesort

All below actions are captured? Or the actions that taken into count are different?

No filtering on rev_change_type so all those actions are included plus a few others like delete and suppress. Essentially, everybody whose actions result in a db write is included.

Since there is a derived table, no index is used - is it ok?

I think this is ok. Maybe @EBernhardson or @Milimetric have a different opinion?

No index should be alright, many of the other limn queries (both flow and other projects) tend to be unoptimized as the existing indexes are the ones used in prod, not anything tailored to the actual queries run here.

Checked for writing-to-db events - they are counted correctly.

No results for the week of 2015-04-11?

MariaDB [testwiki]> SELECT weekstart,
    ->        SUM(user) as unique_users
    ->   FROM (
    ->         SELECT 1 as user,
    ->                DATE(DATE_SUB(FROM_UNIXTIME((conv(substring(hex(rev_id),1,12),16,10)>>2)/1000), interval DAYOFWEEK( FROM_UNIXTIME((conv(substring(hex(rev_id),1,12),16,10)>>2)/1000)) day)) as weekstart
    ->           FROM flow_revision
    ->          WHERE rev_user_wiki  IN ( 'testwiki' )
    ->          GROUP BY rev_user_wiki, rev_user_id, rev_user_ip
    ->        ) x
    ->  WHERE weekstart < DATE_SUB(CURDATE(), interval DAYOFWEEK(NOW()) day)
    ->  GROUP BY weekstart;
+------------+--------------+
| weekstart  | unique_users |
+------------+--------------+
| 2014-05-17 |            1 |
| 2015-04-25 |            2 |
| 2015-05-02 |            4 |
| 2015-05-30 |            4 |
| 2015-06-13 |            1 |
| 2015-06-20 |            1 |
| 2015-06-27 |            1 |
| 2015-07-04 |            2 |
+------------+--------------+
8 rows in set (0.01 sec)

Looking at this in the office -- we found that this version was still undercounting unique users. It was showing 2 users on Catalan from July 18-24, when it was actually at least 18 (verified by going to Catalan Flow boards and counting).

Roan figured out some issues -- putting this back to In Development so Roan can continue.

DannyH reassigned this task from SBisson to Catrope.Jul 27 2015, 11:19 PM
DannyH edited subscribers, added: SBisson; removed: Catrope.

Change 227398 had a related patch set uploaded (by Catrope):
Fix unique-users query

https://gerrit.wikimedia.org/r/227398

Ran the fixed query on testwiki

MariaDB [testwiki]> SELECT DATE_SUB(d, interval DAYOFWEEK(d)-1 day) AS Week,  
COUNT(DISTINCT rev_user_id, COALESCE(rev_user_ip, ''), rev_user_wiki) AS "Unique users" 
 FROM 
 (SELECT DATE(FROM_UNIXTIME((conv(substring(hex(rev_id),1,12),16,10)>>2)/1000)) AS d, rev_user_id, rev_user_ip, rev_user_wiki  FROM flow_revision 
  WHERE rev_user_wiki IN ('testwiki') ) AS x GROUP BY Week HAVING Week < DATE_SUB(CURDATE(), interval DAYOFWEEK(NOW())-1 day) 
ORDER BY Week;
+------------+--------------+
| Week       | Unique users |
+------------+--------------+
| 2014-05-18 |            1 |
| 2015-04-26 |            2 |
| 2015-05-03 |            6 |
| 2015-05-10 |            3 |
| 2015-05-31 |            5 |
| 2015-06-14 |            4 |
| 2015-06-21 |            6 |
| 2015-06-28 |            1 |
| 2015-07-05 |            6 |
| 2015-07-12 |            1 |
| 2015-07-19 |            4 |
+------------+--------------+
11 rows in set (0.01 sec)

which coincides to the previous test results(shifted by one day):

MariaDB [testwiki]> select DATE(DATE_SUB(FROM_UNIXTIME((conv(substring(hex(rev_id),1,12),16,10)>>2)/1000), interval DAYOFWEEK( FROM_UNIXTIME((conv(substring(hex(rev_id),1,12),16,10)>>2)/1000)) day)) as Day, count(distinct(rev_user_id)) as Users from flow_revision  group by Day;
+------------+-------+
| Day        | Users |
+------------+-------+
| 2014-05-17 |     1 |
| 2015-04-25 |     2 |
| 2015-05-02 |     5 |
| 2015-05-09 |     3 |
| 2015-05-30 |     5 |
| 2015-06-13 |     4 |
| 2015-06-20 |     6 |
| 2015-06-27 |     1 |
| 2015-07-04 |     6 |
| 2015-07-11 |     1 |
| 2015-07-18 |     4 |
| 2015-07-25 |     2 |
+------------+-------+
12 rows in set (0.02 sec)

Change 227398 merged by Matthias Mullie:
Fix unique-users query

https://gerrit.wikimedia.org/r/227398

DannyH edited a custom field.Jul 29 2015, 10:40 PM
DannyH closed this task as Resolved.Aug 3 2015, 5:02 PM
DannyH moved this task from Product Review to Done on the Collaboration-Team-Archive-2015-2016 board.
DannyH edited a custom field.Aug 4 2015, 5:46 PM