Page MenuHomePhabricator

hitcounter and _counter tables are on the cluster but were deleted/unsused?
Closed, ResolvedPublic

Description

I had to recreate both on all s3 wikis due to replication problems going through a slave. The table existed on many slaves previous to that.

Clarify that hitcounter has been effectively deleted and is not in use. Clarify the usage of _counter.

If Ok with that, drop both tables from *all servers* not from only a few.

Event Timeline

Mentioned in SAL [2016-04-30T10:19:34Z] <volans> restarted slave on dbstore1001 skipping missing database T132837

Confirmed: the hitcounter table, page.page_counter field and site_stats.ss_total_views field were all completely removed from MW core (about 2 major release cycles ago) and can be dropped from anywhere they exist.

Thank you demon, this will help solving one of the most important issues with unused tables (do to its specific nature, causing outages).

Hello,

@demon can you review what I want to do to see if it looks good to you?

First, I would like to focus on hitcounter first, so we can minimize risks, so the other schema changed discussed could be discussed in a different ticket.

Second, I would like to go ahead and rename hitcounter to see if any problem arises.
I have picked db1015 from S3 as it has some queries so if it is in use

I have check all the hitcounter tables living in the wikis there:

for i in `mysql -hdb1015 -e "show databases;" -AB `; do echo ***$i***; mysql -hdb1015 -e "select count(*) from $i.hitcounter;";done

Where it exists, it has 0 rows, which is great.
I would like to rename them to something like: TO_DROP_hitcounter so we can see if there are reads failing for whatever reason.

Something like this:

for i in `mysql -hdb1015 -e "show databases;" -AB `; do echo ***$i***; mysql -hdb1015 -e "rename hitcounter to TO_DROP_hitcounter";done

If we see no errors for a few days, I would suggest we proceed and do the same in more slaves, so we can have more testing servers.

We can easily rename them back with the above command too.

If all goes good, I would suggest we drop in all the servers and in the master with set sql_log_bin=0 and DROP TABLE IF EXISTS.

If for any reason we need to create it (empty, as it was) we can always do that with:

for i in `mysql -hdb1015 -e "show databases;" -AB `; do echo ***$i***; mysql -hdb1015  -e "CREATE TABLE \`hitcounter1\` (   \`hc_id\` int(10) unsigned NOT NULL DEFAULT '0' ) ENGINE=MEMORY DEFAULT CHARSET=binary MAX_ROWS=25000;";done

Mentioned in SAL [2016-09-12T10:28:19Z] <marostegui> renaming tables in db1015 - T132837

This has been executed:

for i in `mysql -hdb1015 -e "show databases;" -AB `; do echo ***$i***; mysql -hdb1015 $i -e "rename table hitcounter to TO_DROP_hitcounter";done

If we need to roll back we can run:

for i in `mysql -hdb1015 -e "show databases;" -AB `; do echo ***$i***; mysql -hdb1015 $i -e "rename table TO_DROP_hitcounter to hitcounter";done

Mentioned in SAL (#wikimedia-operations) [2016-09-14T13:20:38Z] <marostegui> renaming tables in s3 codfw - T132837

I have renamed the hitcounter table to TO_DROP_hitcounter in all codfw servers:

dbstore2001.codfw.wmnet
dbstore2002.codfw.wmnet
db2036.codfw.wmnet
db2043.codfw.wmnet
db2050.codfw.wmnet
db2057.codfw.wmnet
db2018.codfw.wmnet

To revert: /home/marostegui/rename_revert

I have renamed the table hitcounter on these hosts today

dbstore1001.eqiad.wmnet
dbstore1002.eqiad.wmnet
labsdb1001.eqiad.wmnet
labsdb1003.eqiad.wmnet
db1069.eqiad.wmnet
db1015.eqiad.wmnet
db1035.eqiad.wmnet

Tomorrow I will finish the rest of them:

db1038.eqiad.wmnet
db1044.eqiad.wmnet
db1077.eqiad.wmnet
db1078.eqiad.wmnet
db1075.eqiad.wmnet

Note that by dropping hitcounter from labs hosts but keeping it on db1069 and restarting the server (which was lagged), replication broke:

MariaDB  labsdb1001 (none) > nopager
PAGER set to stdout
MariaDB  labsdb1001 (none) > SHOW SLAVE 's2' STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: db1069.eqiad.wmnet
                  Master_User: repl
                  Master_Port: 3312
                Connect_Retry: 60
[...]
         Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: %wik%.%,information_schema_p.%,heartbeat.%
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1146
                   Last_Error: Error 'Table 'idwiki.hitcounter' doesn't exist' on query. Default database: 'idwiki'. Query: 'DELETE FROM `idwiki`.`hitcounter`'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 4099541
              Relay_Log_Space: 3380357106
[...]
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1146
               Last_SQL_Error: Error 'Table 'idwiki.hitcounter' doesn't exist' on query. Default database: 'idwiki'. Query: 'DELETE FROM `idwiki`.`hitcounter`'
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 4294967295
[...]
1 row in set (0.00 sec)

Thanks for fixing that, I didn't realise that could happen because of db1069.

hitcounter table has been renamed across the board.
I will drop it for good in a few days.

TO_DROP_hitcounter table has been deleted from both DCs, I believe this ticket can be closed now.

MariaDB MARIADB db1083 enwiki > SHOW TABLES like '\_counter%';;
+-------------------------------+
| Tables_in_enwiki (\_counter%) |
+-------------------------------+
| _counters                     |
+-------------------------------+
1 row in set (0.00 sec)

Shall I drop it also from S1 then? The original post said S3 wikis, but I can research if it can be dropped from everywhere.

I wrote that, and I meant that I had to recreate them on s3 because I restarted one of those, creating issues there. If a table is dropped from production, we should make sure it doesn't exist anywhere, labs or production, from all instances.

In fact I literally said:

If Ok with that, drop both tables from *all servers* not from only a few.

Makes sense - I will review that, rename and if doesn't cause issues drop them.

So _counters and hitcounterstill exist at:

S1
enwiki

S2
bgwiki
bgwiktionary
cswiki
enwikiquote
enwiktionary
eowiki
fiwiki
idwiki
itwiki
nlwiki
nowiki
plwiki
ptwiki
svwiki
thwiki
trwiki
zhwiki

S3 - none

S4
commonswiki

S5
dewiki
wikidatawiki

S6
frwiki
jawiki
ruwiki

S7
arwiki
cawiki
eswiki
fawiki
frwiktionary
hewiki
huwiki
kowiki
metawiki
rowiki
ukwiki
viwiki

Using a slave with not much traffic of each shard I have renamed both tables to make sure there are no replication issues. If in 48h all looks fine, I will go ahead an drop them

S1

MariaDB MARIADB db1047 enwiki > rename table hitcounter to TO_DROP_hitcounter; rename table _counters to TO_DROP__counters;
Query OK, 0 rows affected (0.05 sec)

Query OK, 0 rows affected (0.01 sec)

S2

for i in `cat s2_list`; do mysql -hdb1063 $i -e "rename table hitcounter to TO_DROP_hitcounter; rename table _counters to TO_DROP__counters;";done

S4

mysql -hdb1059 commonswiki -e "rename table hitcounter to TO_DROP_hitcounter; rename table _counters to TO_DROP__counters;"

S5

for i in dewiki wikidatawiki; do mysql -hdb1045 $i -e "rename table hitcounter to TO_DROP_hitcounter; rename table _counters to TO_DROP__counters; ";done

S6

for i in frwiki jawiki ruwiki; do mysql -hdb1023 $i -e "rename table hitcounter to TO_DROP_hitcounter; rename table _counters to TO_DROP__counters; " ; done

S7

for i in `cat s3_list`; do mysql -hdb1039 $i -e "rename table hitcounter to TO_DROP_hitcounter; rename table _counters to TO_DROP__counters;";done

Dropped hitcounter and`_counters` tables from S1 enwiki

Dropped hitcounter and`_counters` tables from S2:
bgwiki
bgwiktionary
cswiki
enwikiquote
enwiktionary
eowiki
fiwiki
idwiki
itwiki
nlwiki
nowiki
plwiki
ptwiki
svwiki
thwiki
trwiki
zhwiki

However, replication broke in dbstore1002 and db1063 with:

Last_SQL_Error: Error 'Table 'itwiki._counters' doesn't exist' on query. Default database: 'itwiki'. Query: 'DELETE FROM `itwiki`.`_counters`'

So I have recreated them in those two servers in all the wikis.
The rest looks fine but I need to investigate why only these two failed.
Tables were empty anyways.

If needed this is what I did:

for i in `cat wikis`; do echo $i ; mysql --skip-ssl $i -e "CREATE TABLE \`hitcounter\` (   \`hc_id\` int(10) unsigned NOT NULL DEFAULT '0' ) ENGINE=MEMORY DEFAULT CHARSET=binary MAX_ROWS=25000;";done

for i in `cat wikis`; do echo $i ; mysql --skip-ssl $i -e "CREATE TABLE \`_counters\` (   \`name\` varbinary(16) NOT NULL,   \`counter\` bigint(20) NOT NULL,   PRIMARY KEY (\`name\`) ) ENGINE=MEMORY DEFAULT CHARSET=binary;";done

dbstore2002 (as expected) failed too

I have been investigating what happened and I have used dbstore2002 for this.

I have renamed the table again (S2), one of the ones that broke yesterday as per the logs:

MariaDB DBSTORE localhost enwikiquote > rename table hitcounter to TO_DROP_hitcounter;
Query OK, 0 rows affected (0.05 sec)

So far nothing has happened and replication is flowing fine.
I have been scanning binlogs to see what happened yesterday.
dbstore2002 received this from its upstream master (db2017 - db2017-bin.002356)

#161006 11:21:56 server id 180359173  end_log_pos 165112667     Query   thread_id=4770681       exec_time=0     error_code=0
SET TIMESTAMP=1475752916/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=63/*!*/;
DELETE FROM `enwikiquote`.`hitcounter`
/*!*/;
# at 165112667
#161006 11:21:56 server id 171970567  end_log_pos 165112705     GTID 0-171970567-3844612173

Looking at the upstream master from eqiad (db1018 - db1018-bin.001858).
We can see it was issued there too.

use `enwikiquote`/*!*/;
SET TIMESTAMP=1475752916/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=63/*!*/;
DELETE FROM `enwikiquote`.`hitcounter`
/*!*/;
# at 512047235
#161006 11:21:56 server id 180359173  end_log_pos 512047311     Query   thread_id=4770681       exec_time=0     error_code=0

And of course this happens with all the tables:

root@db1018:/srv/sqldata# /opt/wmf-mariadb10/bin/mysqlbinlog --defaults-file=/root/.my.cnf db1018-bin.001858 | grep hitcounter
DELETE FROM `bgwiki`.`hitcounter`
DELETE FROM `bgwiktionary`.`hitcounter`
DELETE FROM `cswiki`.`hitcounter`
DELETE FROM `enwikiquote`.`hitcounter`
DELETE FROM `enwiktionary`.`hitcounter`
DELETE FROM `eowiki`.`hitcounter`
DELETE FROM `fiwiki`.`hitcounter`
DELETE FROM `idwiki`.`hitcounter`
DELETE FROM `itwiki`.`hitcounter`
DELETE FROM `nlwiki`.`hitcounter`
DELETE FROM `nowiki`.`hitcounter`
DELETE FROM `plwiki`.`hitcounter`
DELETE FROM `ptwiki`.`hitcounter`
DELETE FROM `svwiki`.`hitcounter`
DELETE FROM `thwiki`.`hitcounter`
DELETE FROM `trwiki`.`hitcounter`
DELETE FROM `zhwiki`.`hitcounter`

After checking all this, I have proceed to drop the table I renamed above:

MariaDB DBSTORE localhost enwikiquote > SET SESSION sql_log_bin=0; drop table if exists TO_DROP_hitcounter;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.02 sec)

Same with __counters

MariaDB DBSTORE localhost enwikiquote > SET SESSION sql_log_bin=0; drop table if exists TO_DROP__counters;

And now replication is working fine.
I will try with the rest of the wikis to see if I can reproduce it.

Let's just get rid of them ASAP. Including on non-mediawiki hosts such as db1069, analytics, labs, etc.

Yeah, so far S1 and S3 are clean.
S2 only db1063, dbstore1002 and dbstore2002 (the ones that broke yesterday) have them for the given wikis. I will clean those three and keep going to S4 and so forth.

dbstore1001 broke now too.
I fixed it by doing the same thing I did with the hosts.

root@dbstore1001:~# for i in `cat wikis`; do echo $i ; mysql --skip-ssl $i -e "CREATE TABLE \`hitcounter\` (   \`hc_id\` int(10) unsigned NOT NULL DEFAULT '0' ) ENGINE=MEMORY DEFAULT CHARSET=binary MAX_ROWS=25000;";done
bgwiki
bgwiktionary
cswiki
enwikiquote
enwiktionary
eowiki
fiwiki
idwiki
itwiki
nlwiki
nowiki
plwiki
ptwiki
svwiki
thwiki
trwiki
zhwiki
root@dbstore1001:~# for i in `cat wikis`; do echo $i ; mysql --skip-ssl $i -e "CREATE TABLE \`_counters\` (   \`name\` varbinary(16) NOT NULL,   \`counter\` bigint(20) NOT NULL,   PRIMARY KEY (\`name\`) ) ENGINE=MEMORY DEFAULT CHARSET=binary;";done
bgwiki
bgwiktionary
cswiki
enwikiquote
enwiktionary
eowiki
fiwiki
idwiki
itwiki
nlwiki
nowiki
plwiki
ptwiki
svwiki
thwiki
trwiki
zhwiki

Mentioned in SAL (#wikimedia-operations) [2016-10-10T08:38:10Z] <marostegui> Dropping hitcounter, _counter memory tables in S2 - dbstore2002 - T132837

Mentioned in SAL (#wikimedia-operations) [2016-10-10T08:53:56Z] <marostegui> Dropping hitcounter, _counter memory tables in S2 - dbstore1001 - T132837

Mentioned in SAL (#wikimedia-operations) [2016-10-10T08:57:50Z] <marostegui> Dropping hitcounter, _counter memory tables in S2 - dbstore1002 - T132837

Mentioned in SAL (#wikimedia-operations) [2016-10-10T09:39:11Z] <marostegui> Dropping hitcounter, _counter memory tables in S2 - db1063- T132837

Mentioned in SAL (#wikimedia-operations) [2016-10-10T10:52:31Z] <marostegui> Dropping hitcounter, _counter memory tables in S2 - db1069 - T132837

I have dropped the tables again from

S2, the hosts that failed: db1063, db1069, dbstore1001, dbstore1002, dbstore2002
It has been a while now and none has broken (when they broke the first time it was within 5-10 minutes)

If they need to be recreated:

for i in `cat wikis`; do echo $i ; mysql --skip-ssl $i -e "CREATE TABLE \`hitcounter\` (   \`hc_id\` int(10) unsigned NOT NULL DEFAULT '0' ) ENGINE=MEMORY DEFAULT CHARSET=binary MAX_ROWS=25000;";done

for i in `cat wikis`; do echo $i ; mysql --skip-ssl $i -e "CREATE TABLE \`_counters\` (   \`name\` varbinary(16) NOT NULL,   \`counter\` bigint(20) NOT NULL,   PRIMARY KEY (\`name\`) ) ENGINE=MEMORY DEFAULT CHARSET=binary;";done

I am going to move onto S4 now.

Mentioned in SAL (#wikimedia-operations) [2016-10-10T13:38:54Z] <marostegui> Dropping hitcounter, _counter memory tables in S4 - db1040 (master) - T132837

Regarding S4, I have dropped both tables from the master db1040 (without replicating the statement) to make sure nothing is issued from the master downstream, as it looks it happened with S2 and broke some servers.

So I have done this only in db1040:

set session sql_log_bin=0; drop table if exists hitcounter;drop table if exists _counters;

If we need to create them again:

mysql --skip-ssl commonswiki -e "set session sql_log_bin=0; CREATE TABLE IF NOT EXISTS \`_counters\` (   \`name\` varbinary(16) NOT NULL,   \`counter\` bigint(20) NOT NULL,   PRIMARY KEY (\`name\`) ) ENGINE=MEMORY DEFAULT CHARSET=binary;"

mysql --skip-ssl commonswiki -e "set session sql_log_bin=0; CREATE TABLE IF NOT EXISTS \`hitcounter\` (   \`hc_id\` int(10) unsigned NOT NULL DEFAULT '0' ) ENGINE=MEMORY DEFAULT CHARSET=binary MAX_ROWS=25000;"

I have finished cleaning up the tables in S4 now.
Before dropping them I have searched in the binlogs to see if another DELETE fromwas issued from the master as it happened with S2.
I found none, so looks like dropping it from the master without replicating is a good idea to prevent those.

I am going to move to S5 now (dewiki, wikidatawiki) and clean that up too

While getting ready to drop the table from S5 master I issued a SELECT count from it and I saw that after it finished (0 rows) it issued a DELETE from in the binlog which arrives to the slaves

I assume S2 had the same issue and as the master was the last one getting its table dropped, if it issued the DELETE obviously it broke replication as none of the slaves had it any longer.

MariaDB's documentation about memory tables isn't too extense but Oracle's one says:

To synchronize master and slave MEMORY tables, when a MEMORY table is used on a master for the first time since it was started, a DELETE statement is written to the master's binary log, to empty the table on the slaves also.

So, I have dropped hitcounter_counters from S5 master now (db1049 - DBs: dewiki and wikidatawiki).

root@db1049:~# mysql --skip-ssl dewiki -e "set session sql_log_bin=0; drop table if exists hitcounter;drop table if exists _counters;"

root@db1049:~# mysql --skip-ssl wikidatawiki -e "set session sql_log_bin=0; drop table if exists hitcounter;drop table if exists _counters;"

Mentioned in SAL (#wikimedia-operations) [2016-10-13T07:14:02Z] <marostegui> Dropping hitcounter, _counter memory tables in S5 (dewiki, wikidatawiki) - T132837

Tables dropped from S5 (dewiki, wikidatawiki)

Mentioned in SAL (#wikimedia-operations) [2016-10-13T07:25:56Z] <marostegui> Dropping hitcounter, _counter memory tables in S6 (frwiki jawiki ruwiki) on db1050 (master) - T132837

I have dropped the tables in S6 master (db1050) wikis: frwiki jawiki ruwiki

root@neodymium:/home/marostegui/git/software/dbtools# for i in frwiki jawiki ruwiki ; do echo $i; mysql -hdb1050 $i -e "set session sql_log_bin=0; drop table if exists hitcounter;drop table if exists _counters;";done
frwiki
jawiki
ruwiki

Mentioned in SAL (#wikimedia-operations) [2016-10-13T12:53:52Z] <marostegui> Dropping hitcounter, _counter memory tables in S6 (frwiki jawiki ruwiki) - T132837

S6 (ruwiki,jawiki,frwiki) got the tables removed.

Mentioned in SAL (#wikimedia-operations) [2016-10-14T07:17:20Z] <marostegui> Dropping hitcounter, _counter memory tables in S7 on db1041 (master) - T132837

S7 master db1041 got its tables removed from the following wikis

for i in `cat s7`; do echo $i; mysql -hdb1041 $i -e "set session sql_log_bin=0; drop table if exists hitcounter;drop table if exists _counters;";done
arwiki
cawiki
eswiki
fawiki
frwiktionary
hewiki
huwiki
kowiki
metawiki
rowiki
ukwiki
viwiki

Mentioned in SAL (#wikimedia-operations) [2016-10-17T07:15:35Z] <marostegui> Dropping memory tables hitcounter, _counters from S7 - T132837

I have dropped the tables from S7 hosts.
So all the shards are done now \o/.

I believe this ticket can be finally closed

I've just run this and it seems there are some views left:

root@neodymium:~$ while read host port; do mysql -h $host -P $port -e "SELECT * FROM information_schema.tables WHERE table_name IN ('hitcounter', '_counters')"; done < software/dbtools/s1.hosts
[...]ULL	NULL	View 'zhwiktionary_p.hitcounter' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them
def	zuwiki_p	hitcounter	VIEW	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	View 'zuwiki_p.hitcounter' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them
def	zuwikibooks_p	hitcounter	VIEW	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	View 'zuwikibooks_p.hitcounter' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them
def	zuwiktionary_p	hitcounter	VIEW	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	View 'zuwiktionary_p.hitcounter' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them

Oh, I didn't know we had views...lovely. I will get rid of them.

Thanks for the command!

Running that same command across all the shards reveals no more views (or tables in weird database names like: p50380g50497__wikidb_enwiki

Thanks for pointing this out

I will ping @chasemp here after being resolved, in case some on the create-views script still references these 2 tables (do you need a separate task reminding that?).

I don't see the _counters table referenced in maintain-views or maintain-meta_p

but hitcounter

https://gerrit.wikimedia.org/r/#/c/316348/