This means essentially repeating T138894 with amendments to fix the following issues:
* Null usernames were imported as the string `"NULL"` and null datetimes as the datetime `0000-00-00 00:00:00`.
** This is a limitation of the batch output mode of the `mysql` command line client, which doesn't have an option to use a character (like `"`) to enclose fields. This means that there's no way for `mysqlimport` to distinguish the string `"NULL"` from an actual null value. (There's also no option to use an escape sequence like `\N` for nulls.)
** We can work around this using an `insert select` command instead of using `mysql -e` followed by `mysqlimport`. In the past, that caused problems because it locked the tables being selected from, but we should be able to deal with this by selecting a weak isolation level.
* A very small number of users have names ending with backslashes, which messed with the import process and caused some data corruption (limited to individual lines).
** It seems like this can be fixed by feeding `mysqlimport` the option `--fields-escaped-by=""` instead of the default `--fields-escaped-by="\\"`. This makes very little sense, because a backslash is the escape character in the source file (e.g. the username `\wowzeryest\` appears in the file as `\\wowzeryest\\`), but what can you do?
* Seems to undercount mobile edits compared to using multiquery to count change tags?
* Massively undercounts commons activity because file namespaces are not the main namespace
** Create a table of content namespaces and join on it?
* Additional fields?
** Visual edits?
** Split mobile edits into web and apps?
CREATE TABLE `editor_month` (
`wiki` varbinary(255) NOT NULL,
`month` date NOT NULL,
`local_user_id` int(10) unsigned NOT NULL,
`user_name` varbinary(255) DEFAULT NULL,
`edits` int(10) unsigned NOT NULL,
`main_ns_edits` int(10) unsigned NOT NULL,
`deleted_edits` int(10) unsigned NOT NULL,
`mobile_edits` int(10) unsigned NOT NULL,
`bot` tinyint(1) DEFAULT 0,
`user_registration` datetime DEFAULT NULL,
PRIMARY KEY (`wiki`,`month`,`local_user_id`),
KEY `wiki_user` (`wiki`,`user_name`),
KEY `user_name` (`user_name`)
) ENGINE=TokuDB DEFAULT CHARSET=binary `compression`='tokudb_zlib';