Page MenuHomePhabricator

Drop DB tables for now-deleted fixcopyrightwiki from production
Closed, ResolvedPublic


fixcopyrightwiki (on s3) has been dropped from appserver config and can now have its tables deleted. No back-up needed.

Event Timeline

Marostegui moved this task from Triage to Backlog on the DBA board.
Marostegui subscribed.

Let's truncate them, same as T227717#5806662

As this wiki has been removed everywhere, this triggered the check_private data alert to let us know there's "private" data on sanitarium hosts (and labs hosts) as this week doesn't show up on the dblists anymore.
We need to either exclude it from the check, or go ahead and drop it from sanitarium master (with replication) which is db1112 for s3.

Mentioned in SAL (#wikimedia-operations) [2020-03-04T13:14:09Z] <marostegui> Drop fixcopyrightwiki from sanitarium hosts (db1112, db2074) to avoid getting the data alert - T246055

In order to avoid this alert from firing, I have dropped this database on db1112 and db2074 (sanitarium masters) with replication enabled, so it has been dropped from sanitarium and labs hosts.
I took a backup (1.6M) of its tables just in case, which is temporary at:

root@cumin1001:/home/marostegui/T246055# ls -lh
total 3.1M
-rw-r--r-- 1 root root 1.6M Mar  4 13:17 codfw_fixcopyrightwiki.sql
-rw-r--r-- 1 root root 1.6M Mar  4 13:15 eqiad_fixcopyrightwiki.sql

@Bstorm @JHedden @bd808 can you guys please remove the view for this database? It has already been deleted, but the views are still there and hence triggering the private data check.
I guess I can just issue a drop database fixcopyrightwiki_p directly, but I am wondering if this better be done via maintain-views?

@Marostegui maintain-views can clean up views individually, but it won't drop the DB in any case. May as well just drop it by hand.

Thanks Brooke, I will drop them manually then.

Mentioned in SAL (#wikimedia-operations) [2020-03-11T07:38:23Z] <marostegui> fixcopyrightwiki_p views from labs hosts T246055

Done from the wikireplicas

root@cumin1001:~# for i in labsdb1009 labsdb1010 labsdb1011 labsdb1012; do echo $i; -h$i -e "show databases like 'fixcopyright%'";done

We should not get more private data alerts.

Is there anything that still needs to be done on this task?

Everything :)
We just removed the tables from labs infra .
Dropping wikis isn't something trivial.
We could truncate them though

@Jdforrester-WMF like we've done with some other wikis in the past, can we just truncate the tables and consider this done?

(not to do so)
T169928: Evaluate how hard would be to get aa(wikibooks|wiktionary) and howiki databases deleted
T227717: Drop DB tables for now-deleted zerowiki from production
(to do so but first rename them)
T260112: Remove muswiki and mhwiktionary from s3

Personally I support the latter - first rename database of each of deleted wikis, then delete them. (For a reason: deleted wikis may contains outdated database schema, and may cause issues if the database is somehow used elsewhere)

We cannot rename a database, that's not supported by Mysql unfortunately :-(

@Bugreporter what's the benefit of renaming if they need to be truncated anyways?

LSobanski lowered the priority of this task from Medium to Low.May 14 2021, 11:33 AM

I am going to slowly start truncating tables.

This needs to be done on a per-host basis as the tables were already dropped from sanitarium masters (T246055#5941020). If not, replication will break

I have truncated all the tables in codfw.
Before doing so I took a quick mysqldump and left it at:

root@cumin1001:/home/marostegui/fixcopyrightwiki# ls -lh
total 1.6M
-rw-r--r-- 1 root root 1.6M Jun 11 07:23 fixcopyrightwiki_db2105.sql

eqiad truncate progress

  • dbstore1007
  • dbstore1004
  • db1179
  • db1175
  • db1171
  • db1166
  • db1157
  • db1123
  • db1102