Page MenuHomePhabricator

Drop DB tables for now-deleted zerowiki from production
Open, LowPublic

Description

zerowiki (on s3) has been dropped from appserver config and can now have its tables deleted. (A backup of the tables just in case would be great.)

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 10 2019, 10:31 PM
jcrespo triaged this task as Low priority.Jul 11 2019, 3:34 PM
jcrespo moved this task from Triage to Backlog on the DBA board.
jcrespo added subscribers: Marostegui, jcrespo.

Needs some research to see if it is safe. Low priority because it shouldn't block any other task.

MaxSem added a subscriber: MaxSem.Jul 12 2019, 1:12 AM

Privacy review

Database tables:

MariaDB [zerowiki]> show tables;
+--------------------------+
| Tables_in_zerowiki       |
+--------------------------+
| abuse_filter             |
| abuse_filter_action      |
| abuse_filter_history     |
| abuse_filter_log         |
| actor                    |
| archive                  |
| babel                    |
| betafeatures_user_counts |
| bot_passwords            |
| bv2015_edits             |
| bv2017_edits             |
| category                 |
| categorylinks            |
| change_tag               |
| change_tag_def           |
| comment                  |
| content                  |
| content_models           |
| cu_changes               |
| cu_log                   |
| externallinks            |
| filearchive              |
| filejournal              |
| geo_tags                 |
| global_block_whitelist   |
| globalblocks             |
| hidden                   |
| image                    |
| imagelinks               |
| interwiki                |
| ip_changes               |
| ipblocks                 |
| ipblocks_restrictions    |
| iwlinks                  |
| job                      |
| l10n_cache               |
| langlinks                |
| linter                   |
| log_search               |
| logging                  |
| math                     |
| mathoid                  |
| module_deps              |
| oathauth_users           |
| objectcache              |
| oldimage                 |
| page                     |
| page_props               |
| page_restrictions        |
| pagelinks                |
| protected_titles         |
| querycache               |
| querycache_info          |
| querycachetwo            |
| recentchanges            |
| redirect                 |
| revision                 |
| revision_actor_temp      |
| revision_comment_temp    |
| searchindex              |
| securepoll_cookie_match  |
| securepoll_elections     |
| securepoll_entity        |
| securepoll_lists         |
| securepoll_msgs          |
| securepoll_options       |
| securepoll_properties    |
| securepoll_questions     |
| securepoll_strike        |
| securepoll_voters        |
| securepoll_votes         |
| site_identifiers         |
| site_stats               |
| sites                    |
| slot_roles               |
| slots                    |
| spoofuser                |
| templatelinks            |
| text                     |
| transcache               |
| transcode                |
| updatelog                |
| uploadstash              |
| user                     |
| user_former_groups       |
| user_groups              |
| user_newtalk             |
| user_properties          |
| watchlist                |
+--------------------------+
89 rows in set (0.00 sec)

The main places where data we can't keep for more than 90 days is stored are cu_*, recentchanges and abuse_*. All these tables are already empty. Data in ip_changes is not private but only has 2 edits by localhost made during installation anyway:

MariaDB [zerowiki]> select * from ip_changes;
+------------+-------------------+----------+
| ipc_rev_id | ipc_rev_timestamp | ipc_hex  |
+------------+-------------------+----------+
|          1 | 20140401184017    | 7F000001 |
|          2 | 20140401184018    | 7F000001 |
+------------+-------------------+----------+
2 rows in set (0.00 sec)

Unless someone knows something else, I'd say it's safe to store a backup of this database for a long time.

I'd say it's safe to store a backup

You probably weren't answering me, but safe on my previous comment was meant not to the backup itself, but to the drop. Inter-wiki dependencies (commons, wikidata, interwikis, etc. are known to create outages on other wikis in the past, and there is no existing procedure in place for that. The drop & archive is trivial. Doing it without affecting other wikis may be more complicated- e.g. random extension that setups a job or a cron, or cross-dc queries, or authentication strings related to centralauth that breaks on all wikis because one is "missing"; that is why it needs some careful procedure.

In fact, https://wikitech.wikimedia.org/wiki/Delete_a_wiki says:

When we delete a wiki, the steps outlined below need to be performed, which makes MediaWiki claim that the wiki does not exist. Actually deleting the data from all of the database servers would be quite a bit more complicated, and is usually not worth doing.

But there is always a first, if someone volunteers to help! :-D

Krinkle added a subscriber: Krinkle.EditedJul 19 2019, 6:49 PM

Perhaps a compromise would be to "backup and truncate" instead of "backup and drop". That would mean any attempted query would still superficially pass. One could then also revoke the write permission for that db from the MW user to ensure we get hard failures for any new information attempted to be inserted by some means.

Actually, db grants might also be a good first step from the other direction – instead of truncating, revoke both the read and write grant from the MW user. That step would be easy to reverse, but if without issue for weeks, could then lead to a proper drop without much risk.

Krinkle removed a subscriber: Krinkle.Jul 20 2019, 4:28 PM

Actually, db grants might also be a good first step from the other direction – instead of truncating, revoke both the read and write grant from the MW user. That step would be easy to reverse, but if without issue for weeks, could then lead to a proper drop without much risk.

This might be hard to do, as the way we have grants wouldn't allow us to revoke grants from an specific wiki, as we grant them for 'wik%', so we'd need to basically change them add all the wikis individually and then later, remove them from this specific wiki.