Page MenuHomePhabricator

Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport
Closed, ResolvedPublic1 Estimated Story Points

Description

We've now de-deployed the ImageMetrics extension and have equivalent sources of data, so this data can be deleted.

Event Timeline

This means dropping:

ImageMetricsLoadingTime_10078363
ImageMetricsCorsSupport_10884476
ImageMetricsCorsSupport_11686678

Mentioned in SAL [2016-07-27T06:56:53Z] <jynus> dropping tables from m4 shard T141407

I have dropped these 3 tables from everywhere except dbstore2002, which is usually not accessible to users.

I would ask to keep them like that for some weeks, in case something gets broken and then perform a final deletion if nobody complains.

jcrespo triaged this task as Medium priority.Jul 27 2016, 7:29 AM

@Jdforrester-WMF It seems nobody complained, ok to irreversibly delete these tables?

@Jdforrester-WMF It seems nobody complained, ok to irreversibly delete these tables?

Please do so. Thank you!

@Jdforrester-WMF Metrics are still coming in as recent as 20160808090453, and thus these tables are being recreated by eventlogging application. Unless we do something about this, this task will be impossible to complete.

Isn't client-side code caching great?

OK, wait a month and then try?

:-)

To be fair- I understand why those would be generated. But shouldn't we have a way to discard those on server side "even if you receive event X, ignore it"?

:-)

To be fair- I understand why those would be generated. But shouldn't we have a way to discard those on server side "even if you receive event X, ignore it"?

I believe we could do this by deleting the relevant schema pages from meta (so the events would then fail validation), but that's a pretty hacky way (and obliterates the history for people coming along in the future and wanting to know what we did). Created that task for the wishlist item though.

Let's wait, then. It is not that important, it was just annoying.

Mentioned in SAL [2016-09-08T08:23:12Z] <marostegui> Drop tables: ImageMetricsLoadingTime_10078363 and ImageMetricsCorsSupport_11686678 - T141407

Hello,

We have executed the following in db1046, db1047 and dbstore1002

set sql_log_bin=0 ; drop table ImageMetricsLoadingTime_10078363 ; drop table ImageMetricsCorsSupport_11686678;

This is what I have seen regarding the three tables:

  • ImageMetricsCorsSupport_11686678 still has activity
`
root@db1046:/srv/sqldata# ls -Rlh | grep ImageMetrics
-rw-rw---- 1 mysql mysql   40K Sep  8 01:05 _log_ImageMetricsCorsSupport_11686678_key_ix_ImageMetricsCorsSupport_11686678_timestamp_acd9f031_3_1b_P_0.tokudb
-rw-rw---- 1 mysql mysql   40K Sep  8 01:05 _log_ImageMetricsCorsSupport_11686678_key_ix_ImageMetricsCorsSupport_11686678_uuid_acd9f2ed_3_1b_B_0.tokudb
-rw-rw---- 1 mysql mysql   32K Sep  8 01:05 _log_ImageMetricsCorsSupport_11686678_main_acd9f02a_2_1b.tokudb
-rw-rw---- 1 mysql mysql   64K Sep  8 01:05 _log_ImageMetricsCorsSupport_11686678_status_acd9f02a_1_1b.tokudb
-rw-rw---- 1 mysql mysql   40K Jul 27 08:20 _log_ImageMetricsLoadingTime_10078363_key_ix_ImageMetricsLoadingTime_10078363_timestamp_ac67eb48_3_1b_P_0.tokudb
-rw-rw---- 1 mysql mysql   40K Jul 27 08:20 _log_ImageMetricsLoadingTime_10078363_key_ix_ImageMetricsLoadingTime_10078363_uuid_ac67eb3a_3_1b_B_0.tokudb
-rw-rw---- 1 mysql mysql   32K Jul 27 08:20 _log_ImageMetricsLoadingTime_10078363_main_ac67eb2f_2_1b.tokudb
-rw-rw---- 1 mysql mysql   64K Jul 27 09:20 _log_ImageMetricsLoadingTime_10078363_status_ac67eb2f_1_1b.tokudb
-rw-rw---- 1 mysql mysql 4.8K Jul 28 07:07 ImageMetricsCorsSupport_11686678.frm
-rw-rw---- 1 mysql mysql 6.6K Jul 27 08:19 ImageMetricsLoadingTime_10078363.frm



MariaDB EVENTLOGGING m4 localhost log > select timestamp from ImageMetricsCorsSupport_11686678 order by timestamp desc limit 1\G
*************************** 1. row ***************************
timestamp: 20160908010515
1 row in set (0.00 sec)
`

So I guess it will be recreated again, can we do something about that from code side?

  • ImageMetricsCorsSupport_10884476 looks like gone, it is not in the master
  • ImageMetricsLoadingTime_10078363 stopped its writes back in July
`
MariaDB EVENTLOGGING m4 localhost log > select timestamp from ImageMetricsLoadingTime_10078363 order by timestamp asc limit 1\G
*************************** 1. row ***************************
timestamp: 20160727075704
1 row in set (0.00 sec)
`

In case we need to recover anything there is a backup at:

es2001:/srv/backups/m4

I will check during the week if the table is back and ping you again.
Could you check from your side too?

Thanks

On Friday I will check again if the tables have been recreated and get back to you @Jdforrester-WMF

@Jdforrester-WMF looks like ImageMetricsCorsSupport_11686678 was created yesterday. Can you check from your side why this is still being recreated and being inserted (it has 4 rows)?

MariaDB  db1046 log > show table status like 'ImageMetricsCorsSupport_11686678'\G
*************************** 1. row ***************************
           Name: ImageMetricsCorsSupport_11686678
         Engine: TokuDB
        Version: 10
     Row_format: Dynamic
           Rows: 4
 Avg_row_length: 246
    Data_length: 984
Max_data_length: 9223372036854775807
   Index_length: 232
      Data_free: 36864
 Auto_increment: 5
    Create_time: 2016-09-08 08:57:47
    Update_time: 2016-09-08 20:27:38
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options: `compression`='tokudb_zlib'
        Comment:
MariaDB  db1046 log > select timestamp from ImageMetricsCorsSupport_11686678;
+----------------+
| timestamp      |
+----------------+
| 20160908010520 |
| 20160908085746 |
| 20160908085749 |
| 20160908202640 |
+----------------+
4 rows in set (0.00 sec)

The reference to this was dropped from the code in https://gerrit.wikimedia.org/r/#/c/230982/ which was merged into master on 13 August 2015 and went into production in 1.26wmf19 on 18/19/20 August 2015. Even with the one month cache that we had back then, it should have entirely disappeared by 21 September 2015.

The only thing I can think of is really poorly-behaved third party caches supplying users with long-dead copies CorsLogger.js. Maybe @Tgr has some ideas?

All four records seem to be from the same user (geolocated to US, looking at mobile jawiki, using IE 9), over the span of almost a full day. I can't really imagine how broken caching could cause this since the JS file should not appear in the generated HTML anymore, but maybe the user is looking at old saved HTML snapshots or doing something equally weird?

There should be a way to disable schemas; as a poor man's solution, just move it into talk space IMO.

Don't know if this throws more light into the issue, but the only table recreated is that one.
I just checked and ImageMetricsLoadingTime_10078363 has not been recreated since I dropped it past Thursday.

I am tempted to close this ticket as the only table that is still in use (39 records now) is: ImageMetricsCorsSupport_11686678
The other two never came back.

This is what we have now.

MariaDB EVENTLOGGING m4 localhost log > select count(*) from ImageMetricsCorsSupport_11686678;
+----------+
| count(*) |
+----------+
|       39 |
+----------+
1 row in set (0.00 sec)

MariaDB EVENTLOGGING m4 localhost log > select timestamp from ImageMetricsCorsSupport_11686678;
+----------------+
| timestamp      |
+----------------+
| 20160908010520 |
| 20160908085746 |
| 20160908085749 |
| 20160908202640 |
| 20160910173040 |
| 20160910173047 |
| 20160912040056 |
| 20160912040057 |
| 20160914144037 |
| 20160914144041 |
| 20160915213445 |
| 20160915213446 |
| 20160917145941 |
| 20160917145944 |
| 20160918055836 |
| 20160918175434 |
| 20160918175435 |
| 20160921044609 |
| 20160921044610 |
| 20160921194622 |
| 20160921194624 |
| 20160922142556 |
| 20160922142558 |
| 20160923213942 |
| 20160923213943 |
| 20160924131755 |
| 20160925134350 |
| 20160925134355 |
| 20160926194606 |
| 20160926194607 |
| 20161001000621 |
| 20161001000624 |
| 20161001114604 |
| 20161001114610 |
| 20161001165945 |
| 20161001165946 |
| 20161002195105 |
| 20161002195107 |
| 20161003151327 |
+----------------+

Moving to radar as i think is taken care of by dbas

Moving to radar as i think is taken care of by dbas

Not really, we are a bit lost as we always see the table ImageMetricsLoadingTime_10078363 being regenerated once we drop it :-(

And it is not only being recreated, but used as of today:

root@EVENTLOGGING m4[log]> select timestamp from ImageMetricsCorsSupport_11686678 order by timestamp desc limit 5;
+----------------+
| timestamp      |
+----------------+
| 20170317043318 |
| 20170317034449 |
| 20170317034443 |
| 20170311221131 |
| 20170311221129 |
+----------------+
5 rows in set (0.00 sec)

ping @Jdforrester-WMF looks like you need to remove the instrumentation that is sending events.

Probably some Labs box or personal vagrant install still has ImageMetrics installed. I guess this is a missing feature in EventLogging - there should be a way to archive a schema. Right now if a vandal sent a request for each revision of each schema, they could force the creation of hundreds of databases - could be a DoS vector.

Probably some Labs box or personal vagrant install still has ImageMetrics installed. '

Neither would get sent to production instance of EL, vagrant has its own receiving endpoint and so does labs.

Right now if a vandal sent a request for each revision of each schema, they could force the creation of hundreds of databases - could be a DoS vector.

Yes, true, as long as the db consumer is alive (hopefully not for long).

Some broken spider then, which has cached some old JS for infinity. All requests are from the same client (claims to be an IE9 browser on jawiki).

Change 343809 had a related patch set uploaded (by Nuria):
[operations/puppet] Blacklisting ImageMetrics schemas

https://gerrit.wikimedia.org/r/343809

Ping @jcrespo @Jdforrester-WMF will be adding these schemas to blacklist , so its data will not be sent to database consumer: https://gerrit.wikimedia.org/r/343809

once this change gets merged and puppet runs, tables can be deleted and (regardless of incoming requests) they will not be recreated

Sounds good! Thank you @Nuria once it is merged I will drop it again and see what happens!. If all goes good, we will be able to close this ticket \o/

Change 343809 merged by Ottomata:
[operations/puppet] Blacklisting ImageMetrics schemas

https://gerrit.wikimedia.org/r/343809

We have merged change and restarted processors, tables can be deleted (cc @Marostegui )

We have merged change and restarted processors, tables can be deleted (cc @Marostegui )

Awesome! Thank you very much.
I have done this on db1046, db1047 and dbstore1002 :

root@EVENTLOGGING m4[log]> select timestamp from ImageMetricsCorsSupport_11686678 order by timestamp desc limit 5;
+----------------+
| timestamp      |
+----------------+
| 20170318233819 |
| 20170318205355 |
| 20170318205351 |
| 20170317043320 |
| 20170317043318 |
+----------------+
5 rows in set (0.00 sec)

root@EVENTLOGGING m4[log]> set session sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

root@EVENTLOGGING m4[log]> rename table ImageMetricsCorsSupport_11686678 to TO_DROP_ImageMetricsCorsSupport_11686678;
Query OK, 0 rows affected (0.15 sec)

If the table isn't recreated by the end of the week, I will drop it for good and close this ticket!
Thank you guys!!!

So far so good!

root@EVENTLOGGING m4[log]> show tables like 'ImageMetricsCorsSupport%';
Empty set (0.00 sec)

Let's give it a few more days

So far so good!

root@EVENTLOGGING m4[log]> show tables like 'ImageMetricsCorsSupport%';
Empty set (0.00 sec)

Let's give it a few more days

Bad news, the table is there again @Nuria :-(

root@EVENTLOGGING m4[log]> select @@hostname;
+------------+
| @@hostname |
+------------+
| db1046     |
+------------+
1 row in set (0.00 sec)

root@EVENTLOGGING m4[log]> show tables like 'ImageMetricsCorsSupport%';
+------------------------------------------+
| Tables_in_log (ImageMetricsCorsSupport%) |
+------------------------------------------+
| ImageMetricsCorsSupport_11686678         |
+------------------------------------------+
1 row in set (0.00 sec)

root@EVENTLOGGING m4[log]> select timestamp from ImageMetricsCorsSupport_11686678;
+----------------+
| timestamp      |
+----------------+
| 20170318233820 |
+----------------+
1 row in set (0.00 sec)

Note that record is from 20170318 , a timestamp before the blacklisting changes were merged (2017-03-21). Database insertion on EL is not real time so I think it either: 1) record was enqued for insertion before blacklisting in kafka (unlikely) or 2) it was inserted on master and not until now replicated elsewhere.

Note that record is from 20170318 , a timestamp before the blacklisting changes were merged (2017-03-21). Database insertion on EL is not real time so I think it either: 1) record was enqued for insertion before blacklisting in kafka (unlikely) or 2) it was inserted on master and not until now replicated elsewhere.

That output is from the master (db1046).
I can drop it again and see if it gets recreated during the weekend.

I dropped it on all the hosts

root@neodymium:/home/marostegui/databases_s5# for i in db1046 db1047 dbstore1002; do echo $i; mysql --skip-ssl -h$i log -e "set session sql_log_bin=0; drop table if exists ImageMetricsCorsSupport_11686678;";done
db1046
db1047
dbstore1002

mmm.. master? man, bermuda triangle problem. I was expecting this came from the slave. Triple checking that there is not a typo on the blacklisting

Nuria set the point value for this task to 1.
Nuria moved this task from Next Up to In Progress on the Analytics-Kanban board.

mmm.. master? man, bermuda triangle problem. I was expecting this came from the slave. Triple checking that there is not a typo on the blacklisting

Maybe I am wrong, what I know about EL is that we have:
db1046, db1047 and dbstore1002, being db1046 the master, but I can be wrong!

Anyways, it is now deleted, we will see what happens on Monday!

So far so good, but let's give it a couple of more days:

root@EVENTLOGGING m4[log]> show tables like 'ImageMetricsCorsSupport%';
Empty set (0.00 sec)

mysql:research@analytics-store.eqiad.wmnet [log]> show tables like 'ImageMetricsCorsSupport%';
Empty set (0.02 sec)

I did this query on slave, all looks good. closing ticekt.