Page MenuHomePhabricator

Drop *_old database tables from Wikimedia wikis
Closed, ResolvedPublic

Description

On at least enwiki, there are a number of duplicative *_old database tables that appear to be unused and unneeded.

There are eight such tables:

  • archive_old
  • categorylinks_old
  • image_old
  • imagelinks_old
  • logging_old
  • oldimage_old
  • querycache_old
  • user_old

These database tables should be dropped from all Wikimedia wikis.


Version: wmf-deployment
Severity: enhancement

Details

Reference
bz52932

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:51 AM
bzimport set Reference to bz52932.
alex@alex-laptop:~$ ssh tin.eqiad.wmnet -t sql enwiki -s
mysql> show tables like '%\_old';
Tables_in_enwiki (%\_old)
optin_survey_old
mysql>

So I guess most of the ones listed were already dealt with.

It sounds like we should just resolve this and deal with optin_survey_old as part of T54934 instead

I just made a backup of optin_survey_old at iron:/home/jynus and deleted it from enwiki.

Nothing seems broken.

Should I continue with the other 826 wikis that also have that table? Is data there safe to be fully eliminated without archival?

I've made a list of *old tables on the master:

{P1896}

Probably many people before me found these tables and didn't have enough confidence to delete them. My plan is the following:

  • Perform a backup of these >5000 tables
  • Delete them on the master with IF EXISTS
  • Wait for the world to collapse

While this may seem like a low importance task, it will improve greatly our ability to checksum all database objects. So I would like to resolve it soon. I will wait for some days for people to review the list, then proceed.

I've performed a backup in iron and then deleted all delete*old tables from shards 1-7.

Planned for tomorrow, optin_survey_old tables. But let's check that this has effectively not affected any service.

I've closed T54934 by deleting the table optin_survey_old from all wikis (also, after performing a backup). These are the *old tables still on production:

angwikisource.ipblocks_old
arwikinews.ipblocks_old
azwikisource.ipblocks_old
bat_smgwiki.ipblocks_old
bgwikisource.ipblocks_old
boardwiki.ipblocks_old
bswikisource.ipblocks_old
bugwiki.ipblocks_old
cawikinews.ipblocks_old
cawikisource.ipblocks_old
chapcomwiki.ipblocks_old
chwikimedia.ipblocks_old
comcomwiki.ipblocks_old
cswikisource.ipblocks_old
cywikisource.ipblocks_old
devwikiinternal.old
etwikisource.ipblocks_old
fawikisource.ipblocks_old
fiwikisource.ipblocks_old
fowikisource.ipblocks_old
frpwiki.ipblocks_old
hewikinews.ipblocks_old
htwikisource.ipblocks_old
huwikisource.ipblocks_old
idwikisource.ipblocks_old
ilowiki.ipblocks_old
incubatorwiki.ipblocks_old
iswikisource.ipblocks_old
knwikisource.ipblocks_old
kshwiki.ipblocks_old
ladwiki.ipblocks_old
lijwiki.ipblocks_old
lmowiki.ipblocks_old
ltwikisource.ipblocks_old
map_bmswiki.ipblocks_old
mlwikisource.ipblocks_old
napwiki.ipblocks_old
nds_nlwiki.ipblocks_old
nowikinews.ipblocks_old
nowikisource.ipblocks_old
nrmwiki.ipblocks_old
nzwikimedia.ipblocks_old
papwiki.ipblocks_old
pdcwiki.ipblocks_old
pihwiki.ipblocks_old
pmswiki.ipblocks_old
rel13testwiki.old
rmywiki.ipblocks_old
ruwikinews.ipblocks_old
skwikisource.ipblocks_old
slwikisource.ipblocks_old
spcomwiki.ipblocks_old
testwiki.ipblocks_old
tetwiki.ipblocks_old
tewikisource.ipblocks_old
thwikinews.ipblocks_old
thwikisource.ipblocks_old
trwikisource.ipblocks_old
udmwiki.ipblocks_old
ukwikisource.ipblocks_old
vecwiki.ipblocks_old
viwikisource.ipblocks_old
vlswiki.ipblocks_old
warwiki.ipblocks_old
wikimania2005wiki.image_old
wikimania2005wiki.ipblocks_old
wikimania2005wiki.logging_old
wikimania2005wiki.oldimage_old
wikimania2005wiki.user_old
wikimania2005wiki.watchlist_old
wikimania2006wiki.ipblocks_old
wikimaniateamwiki.ipblocks_old
xalwiki.ipblocks_old
yiwikisource.ipblocks_old
zh_cnwiki.old
zh_min_nanwikisource.ipblocks_old
zh_yuewiki.ipblocks_old
zhwikinews.ipblocks_old

Mostly, the table ipblocks_old, tables on wikimania2005, and "old" on selected wikis. Will wait a bit to double confirm those are not needed (they are not in the original description for this ticket). @MZMcBride, @Krenair, any comments?

I'd prefer that we kill them all.

devwikiinternal.old, rel13testwiki.old, zh_cnwiki.old
I'm not sure these ones should be deleted. These wikis are 'deleted' in the sense that MediaWiki no longer exposes them to users, however as far as I know the DBs are not typically deleted. These specific tables were used on very old (MW 1.4) sites.

jcrespo changed the task status from Open to Stalled.Sep 9 2015, 3:39 PM
jcrespo lowered the priority of this task from Medium to Low.
jcrespo moved this task from In progress to Backlog on the DBA board.

I just came across enwikisource.logging_pre_1_10, which has over 130k rows

It wouldn't surprise me if there's similar tables on other wikis

devwikiinternal.old, rel13testwiki.old, zh_cnwiki.old
I'm not sure these ones should be deleted. These wikis are 'deleted' in the sense that MediaWiki no longer exposes them to users, however as far as I know the DBs are not typically deleted. These specific tables were used on very old (MW 1.4) sites.

Mmm. Certainly the ipblocks_old ones can die in a fire

jcrespo changed the task status from Stalled to Open.Nov 16 2015, 12:30 PM
jcrespo moved this task from Backlog to In progress on the DBA board.

These are the tables that are still on one of the masters and that end in old:

mysql -A -BN information_schema -e "SELECT CONCAT(table_schema, '.', table_name) FROM information_schema.tables WHERE table_name like '%old' AND (table_schema like '%wik%' OR table_schema like '%auth%')" -h <all-s-masters>
chwikimedia.ipblocks_old
comcomwiki.ipblocks_old
devwikiinternal.old
rel13testwiki.old
wikimania2005wiki.image_old
wikimania2005wiki.logging_old
wikimania2005wiki.oldimage_old
wikimania2005wiki.user_old
wikimania2005wiki.watchlist_old
zh_cnwiki.old

chwikimedia and comcomwiki have not been touched either by error or, most likely, due to them not being on any dblist.

I would propose to resolve this ticket, after dropping >5000 tables; set individual tickets -if needed- for the pending dd/tables. I do not care if there is a couple of tables that shouldn't be there- they only affect performance/operations when in large numbers.

Good work! :)

chwikimedia and comcomwiki are both considered "deleted". zh_cnwiki too.

zh_cnwiki the old table may still have some value; depending on when it was "deleted" data may or may not have been migrated from it

I suspect the entire database for rel13testwiki (presumably a testwiki for MW 1.3) can be deleted. Probably similar for devwikiinternal... Not sure what that is. @tstarling or @brion might be able to advise about those.

The 7 _old tables could be considered under the scope of this task... But could as easily be migrated to something else

I have no record of rel13testwiki or devwikiinternal; would assume they're junk but it might be worth double-checking their content before trashing them.

mysql:wikiadmin@db1044 [rel13testwiki]> show tables;
+-------------------------+
| Tables_in_rel13testwiki |
+-------------------------+
| archive                 |
| blobs                   |
| brokenlinks             |
| categorylinks           |
| cur                     |
| hitcounter              |
| image                   |
| imagelinks              |
| interwiki               |
| ipblocks                |
| linkscc                 |
| math                    |
| objectcache             |
| old                     |
| oldimage                |
| querycache              |
| recentchanges           |
| searchindex             |
| site_stats              |
| user                    |
| user_newtalk            |
| watchlist               |
+-------------------------+
22 rows in set (0.00 sec)

mysql:wikiadmin@db1044 [rel13testwiki]>

cur rows are from 2004

mysql:wikiadmin@db1044 [devwikiinternal]> show tables;
+---------------------------+
| Tables_in_devwikiinternal |
+---------------------------+
| archive                   |
| blobs                     |
| brokenlinks               |
| categorylinks             |
| cur                       |
| hitcounter                |
| image                     |
| imagelinks                |
| interwiki                 |
| ipblocks                  |
| links                     |
| linkscc                   |
| logging                   |
| math                      |
| objectcache               |
| old                       |
| oldimage                  |
| querycache                |
| recentchanges             |
| site_stats                |
| user                      |
| user_newtalk              |
| user_rights               |
| validate                  |
| watchlist                 |
+---------------------------+
25 rows in set (0.00 sec)

mysql:wikiadmin@db1044 [devwikiinternal]>

cur rows are from 2005 and look to be a lot of MW namespace pages...

Yeah those sound safe to remove!

I'll put them in a new request per Jaimes request

What about the rest (they are not on a db list)?

chwikimedia.ipblocks_old
comcomwiki.ipblocks_old

wikimania2005wiki tables *_old

zh_cnwiki.old

Do I leave them there as I proposed for now?

What about the rest (they are not on a db list)?

chwikimedia.ipblocks_old
comcomwiki.ipblocks_old

wikimania2005wiki tables *_old

zh_cnwiki.old

Do I leave them there as I proposed for now?

chwikimedia.ipblocks_old and comcomwiki.ipblocks_old can definitely go. They're "deleted" but tables would've been backups.

wikimania2005wiki is in all.dblist... (Where were you looking?) It's readonly, but still exposed. The _old tables there would've been backups pre an upgrade. They can go.

I would leave the zh_chnwiki.old for now. Would have to dig into when that was deleted, and as such, whether the old table has any value

After all the deletions that have happened lately as part of the parent ticket, this is the current status of these tables.
They only exist on s3 on:

chwikimedia.ipblocks_old
comcomwiki.ipblocks_old
wikimania2005wiki.image_old
wikimania2005wiki.logging_old
wikimania2005wiki.oldimage_old
wikimania2005wiki.watchlist_old

They have not been written since 2015, so I will get rid of them.

Marostegui claimed this task.

After all the deletions that have happened lately as part of the parent ticket, this is the current status of these tables.
They only exist on s3 on:

chwikimedia.ipblocks_old
comcomwiki.ipblocks_old
wikimania2005wiki.image_old
wikimania2005wiki.logging_old
wikimania2005wiki.oldimage_old
wikimania2005wiki.watchlist_old

They have not been written since 2015, so I will get rid of them.

Those have been cleaned up