We've now de-deployed the ImageMetrics extension and have equivalent sources of data, so this data can be deleted.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Blacklisting ImageMetrics schemas | operations/puppet | production | +2 -1 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | Feature | None | T18660 Database table cleanup (tracking) | ||
Resolved | LSobanski | T54921 Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) | |||
Resolved | Marostegui | T141407 Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport |
Event Timeline
This means dropping:
ImageMetricsLoadingTime_10078363 ImageMetricsCorsSupport_10884476 ImageMetricsCorsSupport_11686678
Mentioned in SAL [2016-07-27T06:56:53Z] <jynus> dropping tables from m4 shard T141407
I have dropped these 3 tables from everywhere except dbstore2002, which is usually not accessible to users.
I would ask to keep them like that for some weeks, in case something gets broken and then perform a final deletion if nobody complains.
@Jdforrester-WMF Metrics are still coming in as recent as 20160808090453, and thus these tables are being recreated by eventlogging application. Unless we do something about this, this task will be impossible to complete.
:-)
To be fair- I understand why those would be generated. But shouldn't we have a way to discard those on server side "even if you receive event X, ignore it"?
I believe we could do this by deleting the relevant schema pages from meta (so the events would then fail validation), but that's a pretty hacky way (and obliterates the history for people coming along in the future and wanting to know what we did). Created that task for the wishlist item though.
Mentioned in SAL [2016-09-08T08:23:12Z] <marostegui> Drop tables: ImageMetricsLoadingTime_10078363 and ImageMetricsCorsSupport_11686678 - T141407
Hello,
We have executed the following in db1046, db1047 and dbstore1002
set sql_log_bin=0 ; drop table ImageMetricsLoadingTime_10078363 ; drop table ImageMetricsCorsSupport_11686678;
This is what I have seen regarding the three tables:
- ImageMetricsCorsSupport_11686678 still has activity
` root@db1046:/srv/sqldata# ls -Rlh | grep ImageMetrics -rw-rw---- 1 mysql mysql 40K Sep 8 01:05 _log_ImageMetricsCorsSupport_11686678_key_ix_ImageMetricsCorsSupport_11686678_timestamp_acd9f031_3_1b_P_0.tokudb -rw-rw---- 1 mysql mysql 40K Sep 8 01:05 _log_ImageMetricsCorsSupport_11686678_key_ix_ImageMetricsCorsSupport_11686678_uuid_acd9f2ed_3_1b_B_0.tokudb -rw-rw---- 1 mysql mysql 32K Sep 8 01:05 _log_ImageMetricsCorsSupport_11686678_main_acd9f02a_2_1b.tokudb -rw-rw---- 1 mysql mysql 64K Sep 8 01:05 _log_ImageMetricsCorsSupport_11686678_status_acd9f02a_1_1b.tokudb -rw-rw---- 1 mysql mysql 40K Jul 27 08:20 _log_ImageMetricsLoadingTime_10078363_key_ix_ImageMetricsLoadingTime_10078363_timestamp_ac67eb48_3_1b_P_0.tokudb -rw-rw---- 1 mysql mysql 40K Jul 27 08:20 _log_ImageMetricsLoadingTime_10078363_key_ix_ImageMetricsLoadingTime_10078363_uuid_ac67eb3a_3_1b_B_0.tokudb -rw-rw---- 1 mysql mysql 32K Jul 27 08:20 _log_ImageMetricsLoadingTime_10078363_main_ac67eb2f_2_1b.tokudb -rw-rw---- 1 mysql mysql 64K Jul 27 09:20 _log_ImageMetricsLoadingTime_10078363_status_ac67eb2f_1_1b.tokudb -rw-rw---- 1 mysql mysql 4.8K Jul 28 07:07 ImageMetricsCorsSupport_11686678.frm -rw-rw---- 1 mysql mysql 6.6K Jul 27 08:19 ImageMetricsLoadingTime_10078363.frm MariaDB EVENTLOGGING m4 localhost log > select timestamp from ImageMetricsCorsSupport_11686678 order by timestamp desc limit 1\G *************************** 1. row *************************** timestamp: 20160908010515 1 row in set (0.00 sec) `
So I guess it will be recreated again, can we do something about that from code side?
- ImageMetricsCorsSupport_10884476 looks like gone, it is not in the master
- ImageMetricsLoadingTime_10078363 stopped its writes back in July
` MariaDB EVENTLOGGING m4 localhost log > select timestamp from ImageMetricsLoadingTime_10078363 order by timestamp asc limit 1\G *************************** 1. row *************************** timestamp: 20160727075704 1 row in set (0.00 sec) `
In case we need to recover anything there is a backup at:
es2001:/srv/backups/m4
I will check during the week if the table is back and ping you again.
Could you check from your side too?
Thanks
On Friday I will check again if the tables have been recreated and get back to you @Jdforrester-WMF
@Jdforrester-WMF looks like ImageMetricsCorsSupport_11686678 was created yesterday. Can you check from your side why this is still being recreated and being inserted (it has 4 rows)?
MariaDB db1046 log > show table status like 'ImageMetricsCorsSupport_11686678'\G *************************** 1. row *************************** Name: ImageMetricsCorsSupport_11686678 Engine: TokuDB Version: 10 Row_format: Dynamic Rows: 4 Avg_row_length: 246 Data_length: 984 Max_data_length: 9223372036854775807 Index_length: 232 Data_free: 36864 Auto_increment: 5 Create_time: 2016-09-08 08:57:47 Update_time: 2016-09-08 20:27:38 Check_time: NULL Collation: utf8_general_ci Checksum: NULL Create_options: `compression`='tokudb_zlib' Comment:
MariaDB db1046 log > select timestamp from ImageMetricsCorsSupport_11686678; +----------------+ | timestamp | +----------------+ | 20160908010520 | | 20160908085746 | | 20160908085749 | | 20160908202640 | +----------------+ 4 rows in set (0.00 sec)
The reference to this was dropped from the code in https://gerrit.wikimedia.org/r/#/c/230982/ which was merged into master on 13 August 2015 and went into production in 1.26wmf19 on 18/19/20 August 2015. Even with the one month cache that we had back then, it should have entirely disappeared by 21 September 2015.
The only thing I can think of is really poorly-behaved third party caches supplying users with long-dead copies CorsLogger.js. Maybe @Tgr has some ideas?
All four records seem to be from the same user (geolocated to US, looking at mobile jawiki, using IE 9), over the span of almost a full day. I can't really imagine how broken caching could cause this since the JS file should not appear in the generated HTML anymore, but maybe the user is looking at old saved HTML snapshots or doing something equally weird?
There should be a way to disable schemas; as a poor man's solution, just move it into talk space IMO.
Don't know if this throws more light into the issue, but the only table recreated is that one.
I just checked and ImageMetricsLoadingTime_10078363 has not been recreated since I dropped it past Thursday.
I am tempted to close this ticket as the only table that is still in use (39 records now) is: ImageMetricsCorsSupport_11686678
The other two never came back.
This is what we have now.
MariaDB EVENTLOGGING m4 localhost log > select count(*) from ImageMetricsCorsSupport_11686678; +----------+ | count(*) | +----------+ | 39 | +----------+ 1 row in set (0.00 sec) MariaDB EVENTLOGGING m4 localhost log > select timestamp from ImageMetricsCorsSupport_11686678; +----------------+ | timestamp | +----------------+ | 20160908010520 | | 20160908085746 | | 20160908085749 | | 20160908202640 | | 20160910173040 | | 20160910173047 | | 20160912040056 | | 20160912040057 | | 20160914144037 | | 20160914144041 | | 20160915213445 | | 20160915213446 | | 20160917145941 | | 20160917145944 | | 20160918055836 | | 20160918175434 | | 20160918175435 | | 20160921044609 | | 20160921044610 | | 20160921194622 | | 20160921194624 | | 20160922142556 | | 20160922142558 | | 20160923213942 | | 20160923213943 | | 20160924131755 | | 20160925134350 | | 20160925134355 | | 20160926194606 | | 20160926194607 | | 20161001000621 | | 20161001000624 | | 20161001114604 | | 20161001114610 | | 20161001165945 | | 20161001165946 | | 20161002195105 | | 20161002195107 | | 20161003151327 | +----------------+
Not really, we are a bit lost as we always see the table ImageMetricsLoadingTime_10078363 being regenerated once we drop it :-(
And it is not only being recreated, but used as of today:
root@EVENTLOGGING m4[log]> select timestamp from ImageMetricsCorsSupport_11686678 order by timestamp desc limit 5; +----------------+ | timestamp | +----------------+ | 20170317043318 | | 20170317034449 | | 20170317034443 | | 20170311221131 | | 20170311221129 | +----------------+ 5 rows in set (0.00 sec)
ping @Jdforrester-WMF looks like you need to remove the instrumentation that is sending events.
Probably some Labs box or personal vagrant install still has ImageMetrics installed. I guess this is a missing feature in EventLogging - there should be a way to archive a schema. Right now if a vandal sent a request for each revision of each schema, they could force the creation of hundreds of databases - could be a DoS vector.
Probably some Labs box or personal vagrant install still has ImageMetrics installed. '
Neither would get sent to production instance of EL, vagrant has its own receiving endpoint and so does labs.
Right now if a vandal sent a request for each revision of each schema, they could force the creation of hundreds of databases - could be a DoS vector.
Yes, true, as long as the db consumer is alive (hopefully not for long).
Some broken spider then, which has cached some old JS for infinity. All requests are from the same client (claims to be an IE9 browser on jawiki).
Change 343809 had a related patch set uploaded (by Nuria):
[operations/puppet] Blacklisting ImageMetrics schemas
Ping @jcrespo @Jdforrester-WMF will be adding these schemas to blacklist , so its data will not be sent to database consumer: https://gerrit.wikimedia.org/r/343809
once this change gets merged and puppet runs, tables can be deleted and (regardless of incoming requests) they will not be recreated
Sounds good! Thank you @Nuria once it is merged I will drop it again and see what happens!. If all goes good, we will be able to close this ticket \o/
Change 343809 merged by Ottomata:
[operations/puppet] Blacklisting ImageMetrics schemas
We have merged change and restarted processors, tables can be deleted (cc @Marostegui )
Awesome! Thank you very much.
I have done this on db1046, db1047 and dbstore1002 :
root@EVENTLOGGING m4[log]> select timestamp from ImageMetricsCorsSupport_11686678 order by timestamp desc limit 5; +----------------+ | timestamp | +----------------+ | 20170318233819 | | 20170318205355 | | 20170318205351 | | 20170317043320 | | 20170317043318 | +----------------+ 5 rows in set (0.00 sec) root@EVENTLOGGING m4[log]> set session sql_log_bin=0; Query OK, 0 rows affected (0.00 sec) root@EVENTLOGGING m4[log]> rename table ImageMetricsCorsSupport_11686678 to TO_DROP_ImageMetricsCorsSupport_11686678; Query OK, 0 rows affected (0.15 sec)
If the table isn't recreated by the end of the week, I will drop it for good and close this ticket!
Thank you guys!!!
So far so good!
root@EVENTLOGGING m4[log]> show tables like 'ImageMetricsCorsSupport%'; Empty set (0.00 sec)
Let's give it a few more days
So far so good!
root@EVENTLOGGING m4[log]> show tables like 'ImageMetricsCorsSupport%'; Empty set (0.00 sec)
Let's give it a few more days
Bad news, the table is there again @Nuria :-(
root@EVENTLOGGING m4[log]> select @@hostname; +------------+ | @@hostname | +------------+ | db1046 | +------------+ 1 row in set (0.00 sec) root@EVENTLOGGING m4[log]> show tables like 'ImageMetricsCorsSupport%'; +------------------------------------------+ | Tables_in_log (ImageMetricsCorsSupport%) | +------------------------------------------+ | ImageMetricsCorsSupport_11686678 | +------------------------------------------+ 1 row in set (0.00 sec) root@EVENTLOGGING m4[log]> select timestamp from ImageMetricsCorsSupport_11686678; +----------------+ | timestamp | +----------------+ | 20170318233820 | +----------------+ 1 row in set (0.00 sec)
Note that record is from 20170318 , a timestamp before the blacklisting changes were merged (2017-03-21). Database insertion on EL is not real time so I think it either: 1) record was enqued for insertion before blacklisting in kafka (unlikely) or 2) it was inserted on master and not until now replicated elsewhere.
That output is from the master (db1046).
I can drop it again and see if it gets recreated during the weekend.
I dropped it on all the hosts
root@neodymium:/home/marostegui/databases_s5# for i in db1046 db1047 dbstore1002; do echo $i; mysql --skip-ssl -h$i log -e "set session sql_log_bin=0; drop table if exists ImageMetricsCorsSupport_11686678;";done db1046 db1047 dbstore1002
mmm.. master? man, bermuda triangle problem. I was expecting this came from the slave. Triple checking that there is not a typo on the blacklisting
Maybe I am wrong, what I know about EL is that we have:
db1046, db1047 and dbstore1002, being db1046 the master, but I can be wrong!
Anyways, it is now deleted, we will see what happens on Monday!
So far so good, but let's give it a couple of more days:
root@EVENTLOGGING m4[log]> show tables like 'ImageMetricsCorsSupport%'; Empty set (0.00 sec)
mysql:research@analytics-store.eqiad.wmnet [log]> show tables like 'ImageMetricsCorsSupport%';
Empty set (0.02 sec)
I did this query on slave, all looks good. closing ticekt.