Page MenuHomePhabricator

Create production database and users for Striker
Closed, ResolvedPublic

Description

  • database name: striker
  • shard: m5
  • users:
    • striker - application user. Will connect to DB from californium.
    • striker_admin - admin user. Will connect to DB from terbium and californium.
  • grants:
    • GRANT ALL ON striker.* TO 'striker_admin'@'%'
    • GRANT SELECT, INSERT, UPDATE, DELETE ON striker.* TO 'striker'@'%'
  • charset: utf8mb4
  • collation: utf8mb4_bin
CREATE DATABASE striker CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
NOTE: Server will need innodb_large_prefix enabled to support UNIQUE indexes on varchar(255) columns with a utf8mb4 encoding

Striker will be deployed on californium. As a Labs related service I think that its database should live on the m5 shard.

The application currently has 14 tables. Most of them will be tiny (<100 records). There will be a couple of tables that could grow to have a few thousand rows (one tracks Labs user accounts and the other tracks tools). Over the next 12 months additional features will be added which will quite likely add more tables. One desired feature would add tables to describe Tools and track the history of edits to those descriptions. I would be quite surprised if that history table ends up having more than 100,000 records in the near term.

As a django app, the framework would really like to manage database schema changes. I'm not a huge fan of that for various reasons, so I think we should make two separate user accounts: one for the application runtime that is granted CRUD rights on the database and a separate admin user that is granted full access to alter the schema.

The initial schema can be created using a mysqldump from a testing instance (e.g. striker-dev.commtech.eqiad.wmflabs) with subsequent migrations generated via manage.py sqlmigrate and applied manually. I will need access to the striker_admin password to apply these schema changes.

I'd also like an opinion on the proper engine (InnoDB?) and default charset (utf8?) to use in a project of this type.

Event Timeline

bd808 created this task.Aug 9 2016, 11:59 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 9 2016, 11:59 PM
jcrespo moved this task from Triage to Next on the DBA board.Aug 10 2016, 8:58 AM
jcrespo added a subscriber: jcrespo.

InnoDB as an engine; binary if it does something related to mediawiki; utf8mb4 otherwise if it requires international character handling (UTF-8 support).

Change 305046 had a related patch set uploaded (by BryanDavis):
Add initial database schema

https://gerrit.wikimedia.org/r/305046

bd808 added a comment.Aug 16 2016, 8:20 PM

InnoDB as an engine; binary if it does something related to mediawiki; utf8mb4 otherwise if it requires international character handling (UTF-8 support).

InnoDB's unique index length limit and utf8mb4 are making me sad. There are quite a few varchar(255) fields in the schema. Generally these are related to mirroring data from MediaWiki or other external data sources that are using the same length. The default limit of 767 bytes for a unique index means that you can only index a varchar(191) with a utf8mb4 encoding.

Options for dealing with this are:

  • Use another encoding (utf8, binary) and make sure that the Django/python application layer can deal with this. Using utf8 will limit inputs to 3-byte encodings (U+FFFF and below).
  • Use shorter fields and know that it will be possible that some values from MediaWiki will fit in the byte limit but not in the character limit (e.g. 192 ASCII chars).
  • Find out if we have innodb_large_prefix already enabled on the target database (or if we can enable it without causing horrible problems).
bd808 added a comment.Aug 16 2016, 9:56 PM

I tried out using binary as the encoding and found that Django/Python3 returns the fields with that encoding as byte strings (e.g. b'bd808') even if character_set_connection=utf8 is set for the connection. I could work around that by marshaling the data to/from bytes in the model layer, but I'd really rather not add that sort of data manipulation in the app if possible. Things seem to work ok in initial testing when utf8 is used as the encoding. As noted in T142545#2558503 the downside of this is that characters above U+FFFF cannot be stored at all.

I've got my fingers crossed for @jcrespo to tell me that innodb_large_prefix is easily accomplished or better yet already active on the m5 shard.

bd808 moved this task from Backlog to Ready on the Striker board.Aug 16 2016, 10:56 PM

I recommended to use utf8mb4, which is real UTF-8; I recommended against utf8.

jcrespo added a comment.EditedAug 17 2016, 7:01 AM

Why do you think innodb_large_prefix should not be possible? :-P It is not on mediawiki only, which has to be compatible with a 10-year-old MySQL, for some reason. :'-(

bd808 added a comment.Aug 17 2016, 3:32 PM

Why do you think innodb_large_prefix should not be possible? It is not on mediawiki only, which has to be compatible with a 10-year-old MySQL, for some reason.

I just didn't want to automatically assume that it would be possible because I do not fully understand the runtime cost implications of it. I'm glad to hear that it should not be a problem. Having the 3072 byte index limit will make dealing with 3rd party Django components easier as Django assumes that it is possible to create a unique index on varchar(255) columns at a pretty deep level.

I realize that my comment may sound mean, it is not my intention. Please read them as if there was an emoticon ;-P at the start of each of them, I try to do an informal expression, but that not always gets transmitted well in written form. Sorry.

bd808 updated the task description. (Show Details)Aug 17 2016, 3:37 PM
bd808 updated the task description. (Show Details)Aug 17 2016, 5:07 PM

@jcrespo No worries. I'm getting to be able to read your tone after following various tickets. :)

Would it be reasonable to have these users and database created by Wednesday 2016-08-24? @yuvipanda and I are trying to get things lined up for a production deploy of Striker on that date, but haven't made that a firm plan yet pending hearing from you.

Yes, but check the comments on gerrit: I need a source IP and depending where it is, it may need opening "holes" at various levels.

Yes, but check the comments on gerrit: I need a source IP and depending where it is, it may need opening "holes" at various levels.

@jcrespo we are putting this on the same server as horizon (the main labs dashboard) for now so 208.80.154.147, logistics on access otherwise should be ok.

bd808 updated the task description. (Show Details)Aug 17 2016, 8:10 PM
jcrespo raised the priority of this task from Normal to High.Aug 18 2016, 2:23 PM
jcrespo moved this task from Next to In progress on the DBA board.

Change 305506 had a related patch set uploaded (by Jcrespo):
Add public logic for grants to m5 db for striker application

https://gerrit.wikimedia.org/r/305506

Change 305512 had a related patch set uploaded (by Jcrespo):
Fake passwords to mimic in labs the striker-database ones

https://gerrit.wikimedia.org/r/305512

Change 305512 merged by Jcrespo:
Fake passwords to mimic in labs the striker-database ones

https://gerrit.wikimedia.org/r/305512

Change 305599 had a related patch set uploaded (by BryanDavis):
mysql: innodb_large_prefix and CREATE DATABASE options

https://gerrit.wikimedia.org/r/305599

Change 305506 merged by Jcrespo:
Add public logic for grants to m5 db for striker application

https://gerrit.wikimedia.org/r/305506

root@californium:~$ mysql -h m5-master.eqiad.wmnet -u striker -p -e "SHOW CREATE DATABASE striker"
Enter password: 
+----------+-----------------------------------------------------------------------------------------+
| Database | Create Database                                                                         |
+----------+-----------------------------------------------------------------------------------------+
| striker  | CREATE DATABASE `striker` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin */ |
+----------+-----------------------------------------------------------------------------------------+
jcrespo moved this task from In progress to Done on the DBA board.Aug 19 2016, 11:15 AM
MariaDB MISC m5 localhost (none) > SHOW GLOBAL VARIABLES like 'innodb_file_format';
+--------------------+-----------+
| Variable_name      | Value     |
+--------------------+-----------+
| innodb_file_format | Barracuda |
+--------------------+-----------+
1 row in set (0.00 sec)

MariaDB MISC m5 localhost (none) > SHOW GLOBAL VARIABLES like 'innodb_large_prefix';
+---------------------+-------+
| Variable_name       | Value |
+---------------------+-------+
| innodb_large_prefix | ON    |
+---------------------+-------+
1 row in set (0.00 sec)

This is resolved to me, only keeping it open waiting for confirmation from @bd808 everything worked.

Change 305046 merged by jenkins-bot:
Add initial database schema

https://gerrit.wikimedia.org/r/305046

Change 305940 had a related patch set uploaded (by BryanDavis):
Use utf8mb4 charset with MySQL backend

https://gerrit.wikimedia.org/r/305940

Change 305940 merged by jenkins-bot:
Use utf8mb4 charset with MySQL backend

https://gerrit.wikimedia.org/r/305940

bd808 moved this task from Ready to Doing on the Striker board.Aug 25 2016, 9:39 PM
bd808 closed this task as Resolved.Aug 25 2016, 11:54 PM
bd808 assigned this task to jcrespo.

Change 305599 merged by jenkins-bot:
mysql: innodb_large_prefix and CREATE DATABASE options

https://gerrit.wikimedia.org/r/305599

bd808 moved this task from Doing to Done on the Striker board.Mar 4 2017, 3:57 AM