Page MenuHomePhabricator

DB schemas (production changes and test DB) and SQL commands to run for new banner and LP impression data from EL
Closed, ResolvedPublic2 Estimated Story Points

Assigned To
Authored By
AndyRussG
Jun 6 2018, 3:49 PM
Referenced Files
None

Description

Here's what we need:

  1. Specify what (if any) new columns and tables may be needed in pgehres, and make the changes
  2. Create a database for ingressing test data during an initial testing phase
  3. Ensure a seamless switchover from the current pipeline to the new data from EventLogging

Let's plan here the exact SQL commands to run and plan exactly how and when to run them.

Requirements:

  • No data loss or corruption due to switchover.
  • Existing tools used on the production database should work without changes.

Thanks!!!

Event Timeline

AndyRussG added a subscriber: Ejegg.
Vvjjkkii renamed this task from DB schemas (production changes and test DB) and SQL commands to run for new banner and LP impression data from EL to 7ibaaaaaaa.Jul 1 2018, 1:06 AM
Vvjjkkii removed AndyRussG as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed the point value for this task.
Vvjjkkii removed subscribers: Aklapper, gerritbot.
CommunityTechBot renamed this task from 7ibaaaaaaa to DB schemas (production changes and test DB) and SQL commands to run for new banner and LP impression data from EL.Jul 2 2018, 8:08 AM
CommunityTechBot assigned this task to AndyRussG.
CommunityTechBot lowered the priority of this task from High to Medium.
CommunityTechBot set the point value for this task to 2.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added subscribers: Aklapper, gerritbot.

Proposed plan:

Only one change to the production database: rename squidlog table to files_processed. (For other DB cleanup tasks, see T198641, T198639 and T198635. Those changes are not needed for the Kafka pipeline overhaul.) We'll make this change at the same time as we switch the new scripts from the testing database to the production one.

SQL to create the testing database:

CREATE DATABASE new_kafka_testing;

-- Adjust permissions as required (details to be determined)

USE new_kafka_testing;

CREATE TABLE `files_processed` (
  id              INT(11)       UNSIGNED AUTO_INCREMENT,
  filename        VARCHAR(128)  NOT NULL,
  impressiontype  VARCHAR(128)  NOT NULL,
  timestamp       TIMESTAMP,

  PRIMARY KEY (id),
  UNIQUE KEY (filename)
) DEFAULT CHARACTER SET = utf8 ENGINE = InnoDB;

CREATE TABLE `project` (
  id        SMALLINT(3)   UNSIGNED AUTO_INCREMENT,
  project   VARCHAR(128)  NOT NULL,

  PRIMARY KEY(id),
  UNIQUE KEY(project)
) DEFAULT CHARACTER SET = utf8 ENGINE = InnoDB;

CREATE TABLE `language` (
  id        SMALLINT(3)   UNSIGNED AUTO_INCREMENT,
  language  VARCHAR(128)  DEFAULT '' NOT NULL,
  iso_code  VARCHAR(24)   NOT NULL,

  PRIMARY KEY (id),
  UNIQUE KEY (iso_code)
) DEFAULT CHARACTER SET = utf8 ENGINE = InnoDB;

CREATE TABLE `country` (
  id        SMALLINT(3)   UNSIGNED AUTO_INCREMENT,
  country   VARCHAR(128)  DEFAULT '' NOT NULL,
  iso_code  VARCHAR(8) NOT NULL,

  PRIMARY KEY(id),
  UNIQUE KEY (iso_code)
) DEFAULT CHARACTER SET = utf8 ENGINE = InnoDB;

CREATE TABLE `bannerimpressions` (
  id              INT(11)       UNSIGNED AUTO_INCREMENT,
  timestamp       TIMESTAMP     DEFAULT CURRENT_TIMESTAMP NOT NULL,
  banner          VARCHAR(255)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  campaign        VARCHAR(255)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  project_id      SMALLINT(3)   UNSIGNED DEFAULT NULL,
  language_id     SMALLINT(3)   UNSIGNED DEFAULT NULL,
  country_id      SMALLINT(3)   UNSIGNED DEFAULT NULL,
  count           MEDIUMINT(11) DEFAULT 0,

  PRIMARY KEY (id),
  UNIQUE KEY(timestamp, banner, campaign, project_id, language_id, country_id),
  INDEX (timestamp),
  INDEX (banner),
  INDEX (campaign),
  INDEX (project_id),
  INDEX (language_id),
  INDEX (country_id)
) DEFAULT CHARACTER SET = utf8 ENGINE = InnoDB;

CREATE TABLE `landingpageimpression_raw` (
  id              INT(11)       UNSIGNED AUTO_INCREMENT,
  timestamp       TIMESTAMP     DEFAULT CURRENT_TIMESTAMP NOT NULL,
  squid_id        SMALLINT(11)  UNSIGNED DEFAULT NULL,
  squid_sequence  INT(11)       UNSIGNED DEFAULT NULL,
  utm_source      VARCHAR(255)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  utm_campaign    VARCHAR(255)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  utm_medium      VARCHAR(255)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  utm_key         VARCHAR(128)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  landingpage     VARCHAR(255)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  project_id      SMALLINT(3)   UNSIGNED DEFAULT NULL,
  language_id     SMALLINT(3)   UNSIGNED DEFAULT NULL,
  country_id      SMALLINT(3)   UNSIGNED DEFAULT NULL,
  processed       TINYINT(1)    DEFAULT 0,

  PRIMARY KEY (id),
  INDEX (timestamp),
  INDEX (utm_source),
  INDEX (utm_campaign),
  INDEX (utm_medium),
  INDEX (utm_key),
  INDEX (landingpage),
  INDEX (project_id),
  INDEX (language_id),
  INDEX (country_id),
  INDEX (processed),
  UNIQUE (squid_id, squid_sequence, timestamp)
) DEFAULT CHARACTER SET = utf8 ENGINE = InnoDB;

CREATE TABLE `donatewiki_unique` (
  id              INT(11)       UNSIGNED AUTO_INCREMENT,
  timestamp       TIMESTAMP     DEFAULT CURRENT_TIMESTAMP NOT NULL,
  utm_source      VARCHAR(255)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  utm_campaign    VARCHAR(255)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  contact_id      VARCHAR(255)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  link_id         VARCHAR(128)  CHARACTER SET utf8 DEFAULT '' NOT NULL,

  PRIMARY KEY (id),
  UNIQUE KEY utm_source (utm_source, contact_id)
) DEFAULT CHARACTER SET = utf8 ENGINE = InnoDB;

SQL to rename squidlog and create a view for the old name, to run when we switch the new script to production:

use pgehres;
RENAME TABLE squidlog TO files_processed;
CREATE VIEW squidlog AS SELECT id, filename, impressiontype, timestamp FROM files_processed;
-- Fix permissions for renamed table and view

Updated SQL to create the testing database, as per FRUEC's sql/create_tables.sql.

CREATE DATABASE fruec_testing;

-- Adjust permissions as required (details to be determined)

USE fruec_testing;

CREATE TABLE `files` (
  id               INT(11)       UNSIGNED AUTO_INCREMENT,
  filename         VARCHAR(128)  NOT NULL,
  impressiontype   VARCHAR(15)   NOT NULL,
  timestamp        TIMESTAMP     NOT NULL,
  directory        VARCHAR(256)  NOT NULL,
  sample_rate      INT(2)        UNSIGNED DEFAULT NULL,
  status           ENUM('processing', 'consumed') NOT NULL,
  consumed_events  INT           UNSIGNED DEFAULT NULL,
  ignored_events   INT           UNSIGNED DEFAULT NULL,
  invalid_events    INT           UNSIGNED DEFAULT NULL,

  PRIMARY KEY (id),
  UNIQUE KEY (filename),
  INDEX (timestamp),
  INDEX (status)
) DEFAULT CHARACTER SET = utf8 ENGINE = InnoDB;

CREATE TABLE `project` (
  id        SMALLINT(3)   UNSIGNED AUTO_INCREMENT,
  project   VARCHAR(128)  NOT NULL,

  PRIMARY KEY (id),
  UNIQUE KEY (project)
) DEFAULT CHARACTER SET = utf8 ENGINE = InnoDB;

CREATE TABLE `language` (
  id        SMALLINT(3)   UNSIGNED AUTO_INCREMENT,
  iso_code  VARCHAR(24)   NOT NULL,

  PRIMARY KEY (id),
  UNIQUE KEY (iso_code)
) DEFAULT CHARACTER SET = utf8 ENGINE = InnoDB;

CREATE TABLE `country` (
  id        SMALLINT(3)   UNSIGNED AUTO_INCREMENT,
  iso_code  VARCHAR(8)    NOT NULL,

  PRIMARY KEY (id),
  UNIQUE KEY (iso_code)
) DEFAULT CHARACTER SET = utf8 ENGINE = InnoDB;

CREATE TABLE `bannerimpressions` (
  id              INT(11)       UNSIGNED AUTO_INCREMENT,
  timestamp       TIMESTAMP     DEFAULT CURRENT_TIMESTAMP NOT NULL,
  banner          VARCHAR(255)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  campaign        VARCHAR(255)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  project_id      SMALLINT(3)   UNSIGNED DEFAULT NULL,
  language_id     SMALLINT(3)   UNSIGNED DEFAULT NULL,
  country_id      SMALLINT(3)   UNSIGNED DEFAULT NULL,
  count           MEDIUMINT(11) DEFAULT 0,
  file_id         INT(11)       UNSIGNED NOT NULL,

  PRIMARY KEY (id),
  UNIQUE KEY (timestamp, banner, campaign, project_id, language_id, country_id),
  INDEX (timestamp),
  INDEX (banner),
  INDEX (campaign),
  INDEX (project_id),
  INDEX (language_id),
  INDEX (country_id),
  INDEX (file_id)
) DEFAULT CHARACTER SET = utf8 ENGINE = InnoDB;

CREATE TABLE `landingpageimpression_raw` (
  id              INT(11)       UNSIGNED AUTO_INCREMENT,
  timestamp       TIMESTAMP     DEFAULT CURRENT_TIMESTAMP NOT NULL,
  utm_source      VARCHAR(255)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  utm_campaign    VARCHAR(255)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  utm_medium      VARCHAR(255)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  utm_key         VARCHAR(128)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  landingpage     VARCHAR(255)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  project_id      SMALLINT(3)   UNSIGNED DEFAULT NULL,
  language_id     SMALLINT(3)   UNSIGNED DEFAULT NULL,
  country_id      SMALLINT(3)   UNSIGNED DEFAULT NULL,
  file_id         INT(11)       UNSIGNED NOT NULL,

  PRIMARY KEY (id),
  INDEX (timestamp),
  INDEX (utm_source),
  INDEX (utm_campaign),
  INDEX (utm_medium),
  INDEX (utm_key),
  INDEX (landingpage),
  INDEX (project_id),
  INDEX (language_id),
  INDEX (country_id),
  INDEX (file_id)
) DEFAULT CHARACTER SET = utf8 ENGINE = InnoDB;

CREATE TABLE `donatewiki_unique` (
  id              INT(11)       UNSIGNED AUTO_INCREMENT,
  timestamp       TIMESTAMP     DEFAULT CURRENT_TIMESTAMP NOT NULL,
  utm_source      VARCHAR(255)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  utm_campaign    VARCHAR(255)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  contact_id      VARCHAR(255)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  link_id         VARCHAR(128)  CHARACTER SET utf8 DEFAULT '' NOT NULL,
  file_id         INT(11)       UNSIGNED NOT NULL,

  PRIMARY KEY (id),
  UNIQUE KEY utm_source (utm_source, contact_id),
  INDEX (file_id)
) DEFAULT CHARACTER SET = utf8 ENGINE = InnoDB;

So, the SQL to create the testing database should just be whatever's in sql/create_tables.sql in FRUEC.

For the commands to update pgehres, I'm using this GitHub gist, so we can track changes. (I haven't tested that code yet... I'll confirm here when I have.)

I think we're happy this is done as it's a prereq for FRUEC?

I'm closing this task, as the initial work is done, and the remaining work is now tracked separately.

  • SQL to set up the testing database have been applied on production. It's the same as what is required for FRUEC developer setup, now in create_tables.sql in the FRUEC repo.
  • T234958 is for a needed update to the SQL to modify the production database (pgehres).
  • T234969 is for applying the SQL to modify the production database.

Thanks!!