Page MenuHomePhabricator

Investigate the unusual dbs in s3
Closed, DeclinedPublic

Description

While fixing T296537, I found an extra grant on the letter A giving access to basically any database that has the letter a in it. It would have been a useless grant and could have been simply removed on the grounds that it's covered by centralauth and %wik% grant and that is the case for s7. But most of s3 also has this grant (only for one range though: 10.64.% not on 10.192.% unlike s7) and removing this grant also removes access to databases in s3 that doesn't follow the %wik% pattern. These databases are:

  • blocker
  • boards
  • boardvote
  • boardvote2005
  • boardvote2006
  • boardvote2007_test
  • boardvotetest
  • defoundation
  • heartbeat
    • Needed
  • information_schema
    • Needed
  • jamestemp
  • katesdb
  • mysql
    • Needed
  • oai
  • ops
    • Needed
  • performance_schema
    • Needed
  • steward
  • sys
    • Needed
  • webshop

Do we need these databases? Does wikiuser need access to them? If so why only on half of the IP ranges? We can certainly keep them but I'm not sure if half of wikiuser should be able to query these.

Note: wikiadmin doesn't have such grants meaning any maintenance script on such dbs would fail. If the access needed, it should be added to wikiadmin as well.

Event Timeline

btw, this is the list of all s3 dbs that are not in the dblist of s3 (and I doubt mediawiki could route requests to them):

affcomwiki
alswikibooks
alswikiquote
alswiktionary
bawiktionary
blocker
boards
boardvote
boardvote2005
boardvote2006
boardvote2007_test
boardvotetest
chwikimedia
closed_zh_twwiki
comcomwiki
de_labswikimedia
defoundation
dkwiki
dkwikibooks
dkwiktionary
en_labswikimedia
fixcopyrightwiki
flaggedrevs_labswikimedia
heartbeat
information_schema
jamestemp
katesdb
langcomwiki
liquidthreads_labswikimedia
mowiki
mowiktionary
mysql
noboardwiki
oai
ops
performance_schema
readerfeedback_labswikimedia
ru_sibwiki
sep11wiki
steward
strategyappswiki
sys
tlhwiki
tlhwiktionary
tokiponawiki
tokiponawikibooks
tokiponawikiquote
tokiponawiktionary
ukwikimedia
vewikimedia
webshop
wikiconfig
wikimania
zerowiki
zh_cnwiki
zh_twwiki

It seems tokipona wikis have been "deleted" in 2010 (T13511: Delete tokipona wikis) but dbs were not deleted.

Although I think we should clean them up, there are a number of databases of deleted wikis and DBA thinks deleting them will be troublesome. See T246055: Drop DB tables for now-deleted fixcopyrightwiki from production

Aklapper renamed this task from Investigate the unusal dbs in s3 to Investigate the unusual dbs in s3.Dec 9 2021, 7:26 AM

sys schema is https://mariadb.com/kb/en/sys-schema/ and should be on all wmf databases, and the admin user should have access to it on mw databases.
ops is used for the log and functions of the query killer, should be on all mw databases (at least for now).
You should check against the closed.dblist- I have been an advocate that if those were to be kept, they shouldn't be on s3, but on a hypothetical "s0" very small section, (eg. on a vm) to save resources.
Others will be relics of the past of maintenance jobs/creation of wikis by mistake.

sys schema is https://mariadb.com/kb/en/sys-schema/ and should be on all wmf databases, and the admin user should have access to it on mw databases.

Yup, that's why I marked it as needed. It's there for the sake of completeness.

ops is used for the log and functions of the query killer, should be on all mw databases (at least for now).

I wasn't sure about this one, if so, I mark it as needed.

You should check against the closed.dblist- I have been an advocate that if those were to be kept, they shouldn't be on s3, but on a hypothetical "s0" very small section, (eg. on a vm) to save resources.
Others will be relics of the past of maintenance jobs/creation of wikis by mistake.

Well, closed dbs are still on s3 dblist. For example bgwikinews is in both of s3 and closed. From MediaWiki point of view, closed wikis are fully functional wikis that get software updates and schema updates and so on. They just don't let anyone (except stewards) edit them.

Deleted wikis on the other hand are a completely different story and they had different processes through out the years so we are now in a messed up state of each deleted wiki being a different and unique mess. The other problem is that not all of these dbs are wiki dbs. Looking at katesdb, it's random tables with random information and this makes schema changes and other maintenance work on s3 tricky and I sure hope it doesn't get replicated to the cloud. We don't know what's in them.

Sorry, when I said closed, I really meant deleted.dblist.

There are opinions that deleting databases may cause issues (T227717#5327365), but it may also be an issue (and may be one more difficult to detect) if codes try to read from new database schema of a truncated (but not deleted) database of deleted wiki whose schema is never updated.

There are pros and cons and I think this needs to be discussed with other DBAs once they are back (and in a bigger setup) to find a solution. Maybe different dbs need different solutions. etc.

Marostegui moved this task from Triage to Refine on the DBA board.
Marostegui added a subscriber: Marostegui.

The problem with those dbs are if they ever made it to the external store hosts and what's in there. Over the years we've been doubting on whether we can easily delete them or if they could cause some hidden issues. While I think deleting stuff like katesdb is fine, I am not so sure about things like alswiktionary and friends. I'd much rather truncate their tables than fully issuing a drop database.
Another thing that we could do would be to simply rename their tables and see if something breaks.

Anyways, although the list looks quite long, the reality is that compared to the number of wikis we have in s3, it is not that much and I am not sure we should give this much priority given the risk-benefit we'd get by deleting them.

I am going to close this as declined, I don't think it is worth the effort.