Page MenuHomePhabricator

Set up bucketization of editCount fields {tick}
Closed, ResolvedPublic

Description

  1. Add a new column 'editCountBucket' to a closed set of tables
    • Those are a currenlty existing closed set of tables, tables created in the future should implement any bucketization themselves.
    • No changes to the schemas are needed, just to the tables.
  1. Create a SQL script that
    • Given a schema table, an editCount field and a time range
    • Populates the editCountBucket field for that time range
    • The value of editCountBucket should be:
editCount == 0            --->   "0 edits"
1 <= editCount < 5        --->   "1-4 edits"
5 <= editCount < 100      --->   "5-99 edits"
100 <= editCount < 1000   --->   "100-999 edits"
1000 <= editCount         --->   "1000+ edits"

Note that the editCount field name may vary for each schema, for example: editCount vs recipientEditCount vs userEditCount.

  1. Set up the script for periodic execution
    • It must ensure that the field editCountBucket is populated for all schema events before they reach 90 days of age.
    • The script should be automatized for all tables in the closed set.

The list of tables + the editCount field name needs to be created still. It must be extracted from the EL audit spreadsheet, column N:
https://docs.google.com/spreadsheets/d/1mQtbsbGHLbGsHeNaYFCdg6X4K1VSxjUStBpx4GmVbhk/edit#gid=1610723354

Event Timeline

mforns created this task.Aug 12 2015, 6:51 PM
mforns raised the priority of this task from to Needs Triage.
mforns updated the task description. (Show Details)
mforns added a project: Analytics-Backlog.
mforns added a subscriber: mforns.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 12 2015, 6:51 PM
mforns updated the task description. (Show Details)Aug 12 2015, 6:54 PM
mforns set Security to None.
kevinator triaged this task as High priority.Aug 20 2015, 5:28 PM
kevinator moved this task from Incoming to Prioritized on the Analytics-Backlog board.
kevinator moved this task from Prioritized to Blocked on the Analytics-Backlog board.

@Springle

  • Do you think this is possible?
  • Does it make sense to do it like this?
  • Can we write the SQL script and send it to you for CR?

Thanks!

@mforns, yes, feel free.

Eventlogging is alone on its master, so in theory you can hammer it quite hard. However heavy updates can still cause replication lag, so, as ever, it should be batched and rate-limited.

mforns claimed this task.Aug 31 2015, 2:37 PM
mforns edited projects, added Analytics-Kanban; removed Analytics-Backlog.
mforns moved this task from Next Up to In Progress on the Analytics-Kanban board.
mforns added subscribers: Moushira, Milimetric.
mforns moved this task from In Progress to Paused on the Analytics-Kanban board.Aug 31 2015, 3:12 PM
mforns reassigned this task from mforns to jcrespo.Sep 9 2015, 12:42 AM

@jcrespo
I assigned the task to you, as we spoke.
Please, let me know if I can help you in any way.
Thanks!

mforns moved this task from Paused to Next Up on the Analytics-Kanban board.Sep 9 2015, 12:45 AM
mforns moved this task from Next Up to Paused on the Analytics-Kanban board.
jcrespo raised the priority of this task from High to Needs Triage.Sep 9 2015, 10:46 AM
ggellerman moved this task from Blocked to Radar on the Analytics-Backlog board.

It would really help me if, in addition to the Google Spreadsheet (probably can be done from it), we had a list of table names and the name of the column to count in pure text format (CSV, for example).

The idea would be to have a trigger that, on row insert/update, if the counter column is not (special value), or it will not be set to (special value), updates the new column with the right category.

When purged (timestamp > 90 days), the counter value is set to that special value, leaving only the category.

"special value" can be NULL, 0 or -1, it depends if users do things like SUM() on that column. Equally, the new column can be an integer or an enum (string-like), depending if they are going to do sum that new field.


Wouldn't be easier to reuse the same field and having an extra boolean field with anonymized = TRUE (that way users can continue doing sum(), etc, but with fake values)? Need to do more research on this, not very convinced about the proposed solution.

jcrespo triaged this task as Normal priority.Sep 29 2015, 12:52 PM
jcrespo added a project: DBA.
jcrespo moved this task from Triage to Backlog on the DBA board.
mforns added a comment.Oct 6 2015, 3:03 PM

@jcrespo
Here's a TSV with the data you ask for:

table_name	edit_count_field
MobileWebWatching_11761466	event_userEditCount
GatherClicks_11639881	event_userEditCount
GatherClicks_11770314	event_userEditCount
...

The idea would be to have a trigger that, on row insert/update, if the counter column is not (special value), or it will not be set to (special value), updates the new column with the right category.
When purged (timestamp > 90 days), the counter value is set to that special value, leaving only the category.

Awesome. It's what we imagined.

Wouldn't be easier to reuse the same field and having an extra boolean field with anonymized = TRUE (that way users can continue doing sum(), etc, but with fake values)? Need to do more research on this, not very convinced about the proposed solution.

I don't see the point in doing SUM() of categorized values... If we use an integer to express an interval, what would we use, lower bound, or higher bound?

"5-99 edits"  -->  5
OR
"5-99 edits"  -->  99

In any case, the SUM() would be a heavily underestimated or overestimated value, no? I would say having a string is better, at least for the current schemas. In new schemas, the creators can choose to implement their own custom categorizations for editCount fields, maybe smaller buckets? Or even integer representations, if they like.

jcrespo added a comment.EditedOct 6 2015, 3:19 PM

@mforns Of course there is no reason to do a SUM of that, but can you assure that right now every user is checking not to do sums of values > 90 days? Because if not, current applications will break.

I am ok if that is a risk you can assume (in other words, breaking is preferred to bogus results).

mforns added a comment.Oct 6 2015, 3:25 PM

@jcrespo
I understand.

Breaking is not good :]
But in this case, I would prefer that, because it makes the fact explicit, that we are purging the data after 90 days.
We Analytics should work on ensuring that the schema owners know that and know when this purging will be activated.

jcrespo added a comment.EditedOct 6 2015, 3:28 PM

Understood, now that it has been clarified.

Schema changes may take a while. But I am working on them (need research of quantities and sizes).

mforns added a comment.Oct 6 2015, 5:05 PM

Cool!

One note: This task can already start. But the auto-purging task T108850 needs to wait until we have notified all the schema owners.

I am rolling the schema change right now:

MariaDB EVENTLOGGING m4 localhost log > SHOW CREATE TABLE MobileWebWatching_11761466\G
*************************** 1. row ***************************
       Table: MobileWebWatching_11761466
Create Table: CREATE TABLE `MobileWebWatching_11761466` (
  `uuid` char(32) NOT NULL,
  `clientIp` varchar(191) DEFAULT NULL,
  `timestamp` varchar(14) NOT NULL,
  `userAgent` varchar(191) DEFAULT NULL,
  `webHost` varchar(191) DEFAULT NULL,
  `wiki` varchar(191) NOT NULL,
  `event_funnel` varchar(191) NOT NULL,
  `event_isWatched` tinyint(1) NOT NULL,
  `event_mobileMode` varchar(191) NOT NULL,
  `event_userEditCount` bigint(20) DEFAULT NULL,
  `event_userId` bigint(20) NOT NULL,
  `editCountBucket` enum('0 edits','1-4 edits','5-99 edits','100-999 edits','1000+ edits') DEFAULT NULL, -- ** this is the new column **
  UNIQUE KEY `ix_MobileWebWatching_11761466_uuid` (`uuid`),
  KEY `ix_MobileWebWatching_11761466_timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
jcrespo moved this task from Backlog to In progress on the DBA board.Oct 22 2015, 3:00 PM

It will take a day to convert MobileWebClickTracking_5929948 (>300GB). If it fails, we will try with regular tokudb online table.

Writes can continue without problem while the schema change is ongoing, although a small write penalty could be suffered.

Nuria added a subscriber: Nuria.Oct 22 2015, 7:06 PM
jcrespo added a comment.EditedOct 23 2015, 9:38 AM
(in cleanup) 2015-10-23T09:36:38 Error copying rows from 
`log`.`MobileWebClickTracking_5929948` to 
`log`.`_MobileWebClickTracking_5929948_new`: DBD::mysql::st execute failed: 
MySQL server has gone away [for Statement "SHOW GLOBAL STATUS LIKE ?" with 
ParamValues: 0='Threads_running'] at /usr/bin/pt-online-schema-change line 8157

I've finished the alter table (CC @Nuria), by doing Toku tables with its online ALTER. This may make selects a bit slower afterwards, but nothing different from what it is happening now.

This the table definition, with the new column, of the tables in the provided .csv:

Create Table: CREATE TABLE `MobileWebWatchlistClickTracking_10720361` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `uuid` char(32) NOT NULL,
  `clientIp` varchar(191) DEFAULT NULL,
  `clientValidated` tinyint(1) DEFAULT NULL,
  `isTruncated` tinyint(1) DEFAULT NULL,
  `timestamp` varchar(14) NOT NULL,
  `userAgent` varchar(191) DEFAULT NULL,
  `webHost` varchar(191) DEFAULT NULL,
  `wiki` varchar(191) NOT NULL,
  `event_destination` varchar(191) DEFAULT NULL,
  `event_mobileMode` varchar(191) NOT NULL,
  `event_name` varchar(191) DEFAULT NULL,
  `event_userEditCount` bigint(20) DEFAULT NULL,
  `event_username` varchar(191) DEFAULT NULL,
  `editCountBucket` enum('0 edits','1-4 edits','5-99 edits','100-999 edits','1000+ edits') DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `ix_MobileWebWatchlistClickTracking_10720361_uuid` (`uuid`),
  KEY `ix_MobileWebWatchlistClickTracking_10720361_timestamp` (`timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=608312 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

The idea is now:
a) Setup 2 triggers per table on update and on insert that updates the new column automatically (based on the name provided on the file)
b) Do a one time pass/null edit that populates the new column

BTW, editCountBucket is currently NULL for all rows:

mysql> SELECT editCountBucket FROM MobileWebWatchlistClickTracking_10720361 LIMIT 1\G
*************************** 1. row ***************************
editCountBucket: NULL
1 row in set (0.00 sec)

@mforns Is there something that you wanted to tell me before we continue?

jcrespo moved this task from In progress to Backlog on the DBA board.Oct 23 2015, 11:08 AM

@jcrespo

I've finished the alter table

Awesome, thank you!

The idea is now:
a) Setup 2 triggers per table on update and on insert that updates the new column automatically (based on the name provided on the file)
b) Do a one time pass/null edit that populates the new column

Makes total sense.

Is there something that you wanted to tell me before we continue?

Not for this task. We can continue with 1) and 2).

However, for the auto-purging task (next in row): yes, there are things to discuss! I've sent an email to you explaining the issue. We can continue the discussion in the correspondent task: T108850.

Here is the thing: we need to put dbstore1002 up to date, because it is a phantom slave (scripts in terbium update them incrementally).

This should fix: T116599

Also has to be applied independently on db1047.

Which I just did.

As an update, this task showed to be more complex than initially thought. The complex setup of the eventlogging schema means that it is very prone to break, as it happened some weeks ago. That doesn't change anything, except that more resources have to be invested on it for it to be successful.

That doesn't mean that it is not being worked on, but it has to share time with other 125 tickets, sometimes arriving with higher priority, like when mediawiki databases have user-facing issues, or eventlogging replication is broken.

Something I would suggest is:

  • Not using emails to discuss tickets- I cannot keep track of them coming from 200 different people, so they are all ignored. Use phabricator instead. Maybe this is blocking something, but if it is not on phabricator, I cannot know about it.
  • In all other Database-related code, developers themselves manage the SQL and schema (and I only audit it), as it is considered being part of the application. If you are not ok with the current pace of development, I would recommend committing more resources to it, by sending patches. Otherwise, this is 2/125ths of my total time.
Nuria added a comment.Nov 19 2015, 5:26 PM

@jcrespo: Super thanks for your work on this. We understand that priority wise this ticket is not urgent, while being important. We will watch this ticket for any upcoming updates.

@jcrespo
Sorry for pinging you via email, I will avoid that in the future.
Thanks!

Also, @jcrespo, we're happy to work on the SQL, we were just following the process we used with Sean where he preferred to do some of this type of work.

I would prefer it too, but I cannot honstly guarantee to work quickly on this.

Milimetric moved this task from Incoming to Radar on the Analytics board.Jan 12 2016, 7:32 PM
jcrespo moved this task from Backlog to Next on the DBA board.Apr 22 2016, 4:10 PM
jcrespo moved this task from Next to In progress on the DBA board.Apr 27 2016, 10:24 AM

I've checked the structure of the tables, and it was applied to all tables on the CSV, but 2, which no longer exist:

MobileWebClickTracking_5830671:
ERROR 1146 (42S02) at line 1: Table 'log.MobileWebClickTracking_5830671' doesn't exist
MobileWebClickTracking_5830800:
ERROR 1146 (42S02) at line 1: Table 'log.MobileWebClickTracking_5830800' doesn't exist
MobileWebClickTracking_5929948:
jcrespo added a comment.EditedApr 27 2016, 11:21 AM

The following trigger was created on the analytics master for MobileWebWatchlistClickTracking_10720361:

SET SESSION sql_log_bin = 0;
CREATE TRIGGER
MobileWebWatchlistClickTracking_10720361_editCountBucket_BI
BEFORE INSERT ON MobileWebWatchlistClickTracking_10720361
FOR EACH ROW
  SET NEW.editCountBucket =
      CASE 
          WHEN NEW.event_userEditCount = 0                 THEN '0 edits'
          WHEN NEW.event_userEditCount BETWEEN   1 AND   4 THEN '1-4 edits'
          WHEN NEW.event_userEditCount BETWEEN   5 AND  99 THEN '5-99 edits'
          WHEN NEW.event_userEditCount BETWEEN 100 AND 999 THEN '100-999 edits'
          WHEN NEW.event_userEditCount >= 1000             THEN '1000+ edits'
          ELSE NULL
      END
;

Check that that is creating the right values on the slaves for that particular table (only on newly inserted records), and I will apply it (with the list you provided me) to all tables.

It seems that table is no longer active, if you have an example of another table to test, feel free.

Nuria added a comment.EditedApr 27 2016, 3:12 PM

@jcrespo; Thank you, so all other tables listed on spreadsheet but the two you mentioned have been updated with edit count, then. Correct?

Nuria added a comment.Apr 27 2016, 3:21 PM

@jcrespo: nevermind my question i see it now where it pertains:

mysql:research@analytics-store.eqiad.wmnet [log]> desc MobileWebDiffClickTracking_10720373;
+---------------------+------------------------------------------------------------------------+------+-----+---------+-------+

FieldTypeNullKeyDefaultExtra

+---------------------+------------------------------------------------------------------------+------+-----+---------+-------+

idint(11)NOPRINULL
uuidbinary(32)NOUNINULL
clientIpvarbinary(191)YESNULL
clientValidatedtinyint(1)YESNULL
isTruncatedtinyint(1)YESNULL
timestampvarbinary(14)NOMULNULL
userAgentvarbinary(191)YESNULL
webHostvarbinary(191)YESNULL
wikivarbinary(191)NOMULNULL
event_destinationvarbinary(191)YESNULL
event_mobileModevarbinary(191)NONULL
event_namevarbinary(191)YESNULL
event_userEditCountbigint(20)YESNULL
event_usernamevarbinary(191)YESNULL
editCountBucketenum('0 edits','1-4 edits','5-99 edits','100-999 edits','1000+ edits')YESNULL

+---------------------+------------------------------------------------------------------------+------+-----+---------+-------+

Nuria added a comment.Apr 27 2016, 3:23 PM

What would be the best way to add an editcount column to new schemas going forward? Should we have an audit once every 6 months to find tables that might need it?

jcrespo added a comment.EditedApr 27 2016, 4:24 PM

This is the proposed next steps:

  1. Deploy to all tables (only MobileWebWatchlistClickTracking_10720361 was was done as a test, waiting for validation- although I have already tested informally with temporary values).
  2. Update the *column* (it has not been done yet- it is being currently done only for new columns. We need to update the rows on all tables on all servers- that may take some considerable amount of time, depending on the available sizes)
  3. The trigger (and the field) must be added by the application on schema creation. The code is up here, only the table name and "edits" field will change every time. This has to go on the application, before values are inserted, otherwise, this long process will have to be repeated every single time a new table is needed. How you decide which tables must have it or not is part of the user handling (how do you create tables now?). I would recommend have your application the possibility of enabling it (probably not disabling it) afterwards dynamically so you do not block on infrastructure anymore.
  4. Proceed with the purging. T108850

I understand the need to involve me for the initial setup (and I never rejected such a role), but I am genuinely worried of needing a DBA or anyone in operations involved on the decisions of your application's coding (aside from performance consultancy/code review, which is always granted). These creates unnecessary blocking on your side on someone with very little bandwidth for anything that is not firefighting issues. If it is MySQL knowledge that your team lacks or worries about, allow me to offer some training to them with all my good intentions. This is not a complain, it is genuine concern of you being blocked for several months on something otherwise trivial, that should not take more than a few hundreds lines of code. In other words, I am ok with being a "consultant", and propose a particular solution, but the application eventlogging needs to be owned by your department (my role is to make sure the servers are working and safe).

This is the full list of triggers I am going to create on the master. Only before insert, assuming no updates take place, ever:

$ ./eventlogging_triggers.sh 

SET SESSION sql_log_bin = 0;
CREATE TRIGGER
MobileWebWatching_11761466_editCountBucket_BI
BEFORE INSERT ON MobileWebWatching_11761466
FOR EACH ROW
  SET NEW.editCountBucket =
      CASE 
          WHEN NEW.event_userEditCount = 0                 THEN '0 edits'
          WHEN NEW.event_userEditCount BETWEEN   1 AND   4 THEN '1-4 edits'
          WHEN NEW.event_userEditCount BETWEEN   5 AND  99 THEN '5-99 edits'
          WHEN NEW.event_userEditCount BETWEEN 100 AND 999 THEN '100-999 edits'
          WHEN NEW.event_userEditCount >= 1000             THEN '1000+ edits'
          ELSE NULL
      END
;

SET SESSION sql_log_bin = 0;
CREATE TRIGGER
GatherClicks_11639881_editCountBucket_BI
BEFORE INSERT ON GatherClicks_11639881
FOR EACH ROW
  SET NEW.editCountBucket =
      CASE 
          WHEN NEW.event_userEditCount = 0                 THEN '0 edits'
          WHEN NEW.event_userEditCount BETWEEN   1 AND   4 THEN '1-4 edits'
          WHEN NEW.event_userEditCount BETWEEN   5 AND  99 THEN '5-99 edits'
          WHEN NEW.event_userEditCount BETWEEN 100 AND 999 THEN '100-999 edits'
          WHEN NEW.event_userEditCount >= 1000             THEN '1000+ edits'
          ELSE NULL
      END
;

SET SESSION sql_log_bin = 0;
CREATE TRIGGER
GatherClicks_11770314_editCountBucket_BI
BEFORE INSERT ON GatherClicks_11770314
FOR EACH ROW
  SET NEW.editCountBucket =
      CASE 
          WHEN NEW.event_userEditCount = 0                 THEN '0 edits'
          WHEN NEW.event_userEditCount BETWEEN   1 AND   4 THEN '1-4 edits'
          WHEN NEW.event_userEditCount BETWEEN   5 AND  99 THEN '5-99 edits'
          WHEN NEW.event_userEditCount BETWEEN 100 AND 999 THEN '100-999 edits'
          WHEN NEW.event_userEditCount >= 1000             THEN '1000+ edits'
          ELSE NULL
      END
;

SET SESSION sql_log_bin = 0;
CREATE TRIGGER
GatherClicks_12114785_editCountBucket_BI
BEFORE INSERT ON GatherClicks_12114785
FOR EACH ROW
  SET NEW.editCountBucket =
      CASE 
          WHEN NEW.event_userEditCount = 0                 THEN '0 edits'
          WHEN NEW.event_userEditCount BETWEEN   1 AND   4 THEN '1-4 edits'
          WHEN NEW.event_userEditCount BETWEEN   5 AND  99 THEN '5-99 edits'
          WHEN NEW.event_userEditCount BETWEEN 100 AND 999 THEN '100-999 edits'
          WHEN NEW.event_userEditCount >= 1000             THEN '1000+ edits'
          ELSE NULL
      END
;

SET SESSION sql_log_bin = 0;
CREATE TRIGGER
GatherFlags_11793295_editCountBucket_BI
BEFORE INSERT ON GatherFlags_11793295
FOR EACH ROW
  SET NEW.editCountBucket =
      CASE 
          WHEN NEW.event_userEditCount = 0                 THEN '0 edits'
          WHEN NEW.event_userEditCount BETWEEN   1 AND   4 THEN '1-4 edits'
          WHEN NEW.event_userEditCount BETWEEN   5 AND  99 THEN '5-99 edits'
          WHEN NEW.event_userEditCount BETWEEN 100 AND 999 THEN '100-999 edits'
          WHEN NEW.event_userEditCount >= 1000             THEN '1000+ edits'
          ELSE NULL
      END
;

SET SESSION sql_log_bin = 0;
CREATE TRIGGER
MobileWebClickTracking_5830671_editCountBucket_BI
BEFORE INSERT ON MobileWebClickTracking_5830671
FOR EACH ROW
  SET NEW.editCountBucket =
      CASE 
          WHEN NEW.event_userEditCount = 0                 THEN '0 edits'
          WHEN NEW.event_userEditCount BETWEEN   1 AND   4 THEN '1-4 edits'
          WHEN NEW.event_userEditCount BETWEEN   5 AND  99 THEN '5-99 edits'
          WHEN NEW.event_userEditCount BETWEEN 100 AND 999 THEN '100-999 edits'
          WHEN NEW.event_userEditCount >= 1000             THEN '1000+ edits'
          ELSE NULL
      END
;

SET SESSION sql_log_bin = 0;
CREATE TRIGGER
MobileWebClickTracking_5830800_editCountBucket_BI
BEFORE INSERT ON MobileWebClickTracking_5830800
FOR EACH ROW
  SET NEW.editCountBucket =
      CASE 
          WHEN NEW.event_userEditCount = 0                 THEN '0 edits'
          WHEN NEW.event_userEditCount BETWEEN   1 AND   4 THEN '1-4 edits'
          WHEN NEW.event_userEditCount BETWEEN   5 AND  99 THEN '5-99 edits'
          WHEN NEW.event_userEditCount BETWEEN 100 AND 999 THEN '100-999 edits'
          WHEN NEW.event_userEditCount >= 1000             THEN '1000+ edits'
          ELSE NULL
      END
;

SET SESSION sql_log_bin = 0;
CREATE TRIGGER
MobileWebClickTracking_5929948_editCountBucket_BI
BEFORE INSERT ON MobileWebClickTracking_5929948
FOR EACH ROW
  SET NEW.editCountBucket =
      CASE 
          WHEN NEW.event_userEditCount = 0                 THEN '0 edits'
          WHEN NEW.event_userEditCount BETWEEN   1 AND   4 THEN '1-4 edits'
          WHEN NEW.event_userEditCount BETWEEN   5 AND  99 THEN '5-99 edits'
          WHEN NEW.event_userEditCount BETWEEN 100 AND 999 THEN '100-999 edits'
          WHEN NEW.event_userEditCount >= 1000             THEN '1000+ edits'
          ELSE NULL
      END
;

SET SESSION sql_log_bin = 0;
CREATE TRIGGER
MobileWebCta_5972486_editCountBucket_BI
BEFORE INSERT ON MobileWebCta_5972486
FOR EACH ROW
  SET NEW.editCountBucket =
      CASE 
          WHEN NEW.event_userEditCount = 0                 THEN '0 edits'
          WHEN NEW.event_userEditCount BETWEEN   1 AND   4 THEN '1-4 edits'
          WHEN NEW.event_userEditCount BETWEEN   5 AND  99 THEN '5-99 edits'
          WHEN NEW.event_userEditCount BETWEEN 100 AND 999 THEN '100-999 edits'
          WHEN NEW.event_userEditCount >= 1000             THEN '1000+ edits'
          ELSE NULL
      END
;

SET SESSION sql_log_bin = 0;
CREATE TRIGGER
MobileWebCta_5972684_editCountBucket_BI
BEFORE INSERT ON MobileWebCta_5972684
FOR EACH ROW
  SET NEW.editCountBucket =
      CASE 
          WHEN NEW.event_userEditCount = 0                 THEN '0 edits'
          WHEN NEW.event_userEditCount BETWEEN   1 AND   4 THEN '1-4 edits'
          WHEN NEW.event_userEditCount BETWEEN   5 AND  99 THEN '5-99 edits'
          WHEN NEW.event_userEditCount BETWEEN 100 AND 999 THEN '100-999 edits'
          WHEN NEW.event_userEditCount >= 1000             THEN '1000+ edits'
          ELSE NULL
      END
;

SET SESSION sql_log_bin = 0;
CREATE TRIGGER
MobileWebDiffClickTracking_10720373_editCountBucket_BI
BEFORE INSERT ON MobileWebDiffClickTracking_10720373
FOR EACH ROW
  SET NEW.editCountBucket =
      CASE 
          WHEN NEW.event_userEditCount = 0                 THEN '0 edits'
          WHEN NEW.event_userEditCount BETWEEN   1 AND   4 THEN '1-4 edits'
          WHEN NEW.event_userEditCount BETWEEN   5 AND  99 THEN '5-99 edits'
          WHEN NEW.event_userEditCount BETWEEN 100 AND 999 THEN '100-999 edits'
          WHEN NEW.event_userEditCount >= 1000             THEN '1000+ edits'
          ELSE NULL
      END
;

SET SESSION sql_log_bin = 0;
CREATE TRIGGER
MobileWebMainMenuClickTracking_10703095_editCountBucket_BI
BEFORE INSERT ON MobileWebMainMenuClickTracking_10703095
FOR EACH ROW
  SET NEW.editCountBucket =
      CASE 
          WHEN NEW.event_userEditCount = 0                 THEN '0 edits'
          WHEN NEW.event_userEditCount BETWEEN   1 AND   4 THEN '1-4 edits'
          WHEN NEW.event_userEditCount BETWEEN   5 AND  99 THEN '5-99 edits'
          WHEN NEW.event_userEditCount BETWEEN 100 AND 999 THEN '100-999 edits'
          WHEN NEW.event_userEditCount >= 1000             THEN '1000+ edits'
          ELSE NULL
      END
;

SET SESSION sql_log_bin = 0;
CREATE TRIGGER
MobileWebMainMenuClickTracking_11568715_editCountBucket_BI
BEFORE INSERT ON MobileWebMainMenuClickTracking_11568715
FOR EACH ROW
  SET NEW.editCountBucket =
      CASE 
          WHEN NEW.event_userEditCount = 0                 THEN '0 edits'
          WHEN NEW.event_userEditCount BETWEEN   1 AND   4 THEN '1-4 edits'
          WHEN NEW.event_userEditCount BETWEEN   5 AND  99 THEN '5-99 edits'
          WHEN NEW.event_userEditCount BETWEEN 100 AND 999 THEN '100-999 edits'
          WHEN NEW.event_userEditCount >= 1000             THEN '1000+ edits'
          ELSE NULL
      END
;

SET SESSION sql_log_bin = 0;
CREATE TRIGGER
MobileWebUIClickTracking_10742159_editCountBucket_BI
BEFORE INSERT ON MobileWebUIClickTracking_10742159
FOR EACH ROW
  SET NEW.editCountBucket =
      CASE 
          WHEN NEW.event_userEditCount = 0                 THEN '0 edits'
          WHEN NEW.event_userEditCount BETWEEN   1 AND   4 THEN '1-4 edits'
          WHEN NEW.event_userEditCount BETWEEN   5 AND  99 THEN '5-99 edits'
          WHEN NEW.event_userEditCount BETWEEN 100 AND 999 THEN '100-999 edits'
          WHEN NEW.event_userEditCount >= 1000             THEN '1000+ edits'
          ELSE NULL
      END
;

SET SESSION sql_log_bin = 0;
CREATE TRIGGER
MobileWebWatchlistClickTracking_10720361_editCountBucket_BI
BEFORE INSERT ON MobileWebWatchlistClickTracking_10720361
FOR EACH ROW
  SET NEW.editCountBucket =
      CASE 
          WHEN NEW.event_userEditCount = 0                 THEN '0 edits'
          WHEN NEW.event_userEditCount BETWEEN   1 AND   4 THEN '1-4 edits'
          WHEN NEW.event_userEditCount BETWEEN   5 AND  99 THEN '5-99 edits'
          WHEN NEW.event_userEditCount BETWEEN 100 AND 999 THEN '100-999 edits'
          WHEN NEW.event_userEditCount >= 1000             THEN '1000+ edits'
          ELSE NULL
      END
;

Mentioned in SAL [2016-04-27T18:02:31Z] <jynus> generating new triggers for eventlogging_sync schema T108856

These are the triggers created, as you can see 13, 2 less than the list proposed due to 2 nonexistent tables:

MariaDB  db1046.eqiad.wmnet log > SELECT trigger_name, EVENT_OBJECT_SCHEMA, EVENT_OBJECT_TABLE FROM information_schema.triggers;
+-------------------------------------------------------------+---------------------+------------------------------------------+
| trigger_name                                                | EVENT_OBJECT_SCHEMA | EVENT_OBJECT_TABLE                       |
+-------------------------------------------------------------+---------------------+------------------------------------------+
| GatherClicks_11639881_editCountBucket_BI                    | log                 | GatherClicks_11639881                    |
| GatherClicks_11770314_editCountBucket_BI                    | log                 | GatherClicks_11770314                    |
| GatherClicks_12114785_editCountBucket_BI                    | log                 | GatherClicks_12114785                    |
| GatherFlags_11793295_editCountBucket_BI                     | log                 | GatherFlags_11793295                     |
| MobileWebClickTracking_5929948_editCountBucket_BI           | log                 | MobileWebClickTracking_5929948           |
| MobileWebCta_5972486_editCountBucket_BI                     | log                 | MobileWebCta_5972486                     |
| MobileWebCta_5972684_editCountBucket_BI                     | log                 | MobileWebCta_5972684                     |
| MobileWebDiffClickTracking_10720373_editCountBucket_BI      | log                 | MobileWebDiffClickTracking_10720373      |
| MobileWebMainMenuClickTracking_10703095_editCountBucket_BI  | log                 | MobileWebMainMenuClickTracking_10703095  |
| MobileWebMainMenuClickTracking_11568715_editCountBucket_BI  | log                 | MobileWebMainMenuClickTracking_11568715  |
| MobileWebUIClickTracking_10742159_editCountBucket_BI        | log                 | MobileWebUIClickTracking_10742159        |
| MobileWebWatching_11761466_editCountBucket_BI               | log                 | MobileWebWatching_11761466               |
| MobileWebWatchlistClickTracking_10720361_editCountBucket_BI | log                 | MobileWebWatchlistClickTracking_10720361 |
+-------------------------------------------------------------+---------------------+------------------------------------------+
13 rows in set (0.01 sec)

Let's wait a day to check for replication errors or application insertion errors, before continuing the process.

@Nuria @jcrespo

We should not support editCount bucketting for new schemas. This task is exclusive for existing schemas at the time of the audit, as a backwards compatibility feature. From the audit onwards, the schemas that want to have editCount fields stored for more than 90 days, will have to send that field pre-bucketized to EL (schema owners should bucketize the field client-side). Otherwise, that field will be necessarily purged after 90 days. That was the initial idea. So, I don't think we need any action in this regard for new schemas.

Mentioned in SAL [2016-04-28T10:41:42Z] <jynus> running update table on eventlogging database on the master (db1046) T108856

This is done for me, waiting for confirmation to resolve the ticket:

MariaDB  dbstore1002.eqiad.wmnet log > SELECT event_userEditCount, editCountBucket FROM MobileWebCta_5972684 ORDER BY timestamp DESC LIMIT 10;
+---------------------+-----------------+
| event_userEditCount | editCountBucket |
+---------------------+-----------------+
|                 260 | 100-999 edits   |
|                   1 | 1-4 edits       |
|                   1 | 1-4 edits       |
|                   0 | 0 edits         |
|                   1 | 1-4 edits       |
|                   1 | 1-4 edits       |
|                   0 | 0 edits         |
|                   4 | 1-4 edits       |
|                   1 | 1-4 edits       |
|                   1 | 1-4 edits       |
+---------------------+-----------------+
10 rows in set (0.00 sec)
jcrespo moved this task from In progress to Done on the DBA board.May 2 2016, 9:06 AM
mforns added a comment.EditedMay 12 2016, 11:09 PM

@Nuria @jcrespo
Shouldn't this be marked as resolved?

Nuria closed this task as Resolved.May 13 2016, 3:21 AM