Page MenuHomePhabricator

Add 'centralauth' to meta_p.wiki so that apps can re-use the appropriate slice
Open, LowPublic

Description

Right now GUC uses a dedicated connection for centralauth via centralauth.web.db.svc, unlike queries for other databases (which use s#.web.db.svc).

It'd help if there is an entry for centralauth in meta_p.wiki logically mapping it to an existing slice (e.g. s1 or s7) so that apps can be written in a way that doesn't encourage new connections for centralauth.

Event Timeline

Krinkle triaged this task as High priority.
Krinkle raised the priority of this task from High to Needs Triage.
Krinkle created this task.
Krinkle added subscribers: jcrespo, bd808.

@Krinkle, in fact, replica maintenance scripts do fall into cloud team territory, so not much we can help here. But it seems like a reasonable request to me, centralauth should be on s7, and while I could manually add it easily, I prefer them to do it, as it probably needs code changes and would be lost on the next change.

centralauth isn't a wiki, so adding it to meta_p.wiki is a bit weird.

(u3518@s7.labsdb) [meta_p]> describe wiki;
+------------------+--------------+------+-----+---------+-------+
| Field            | Type         | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+-------+
| dbname           | varchar(32)  | NO   | PRI | NULL    |       |
| lang             | varchar(12)  | NO   |     | en      |       |
| name             | text         | YES  |     | NULL    |       |
| family           | text         | YES  |     | NULL    |       |
| url              | text         | YES  |     | NULL    |       |
| size             | decimal(1,0) | NO   |     | 1       |       |
| slice            | text         | NO   |     | NULL    |       |
| is_closed        | decimal(1,0) | NO   |     | 0       |       |
| has_echo         | decimal(1,0) | NO   |     | 1       |       |
| has_flaggedrevs  | decimal(1,0) | NO   |     | 0       |       |
| has_visualeditor | decimal(1,0) | NO   |     | 0       |       |
| has_wikidata     | decimal(1,0) | NO   |     | 0       |       |
| is_sensitive     | decimal(1,0) | NO   |     | 0       |       |
+------------------+--------------+------+-----+---------+-------+
13 rows in set (0.00 sec)

So the row for centralauth would be something like:

*************************** 1. row ***************************
          dbname: centralauth
            lang: 
            name: 
          family: 
             url: 
            size: 
           slice: s7.labsdb
       is_closed: 0
        has_echo: 0
 has_flaggedrevs: 0
has_visualeditor: 0
    has_wikidata: 0
    is_sensitive: 0
1 row in set (0.00 sec)

That seems weird. Why not just make your app use s7 directly? Are you worried that the centralauth db will be moved to another slice without notice?

@bd808 Yeah, it's not a wiki. But if not in meta_p.wiki, then where? Perhaps a new meta for shared tables, e.g. meta_p.shared?

I'm not specifically worried about it moving to another slice. It's just that the information is domain specific and doesn't belong in individual tools. Discovery makes things easier to test locally as well. But yeah, also in case of change, whether or not with notice, I'd rather not have to chase down uses throughout different tools, frameworks and dependencies. For the same reasons we map wikis to slice in meta_p.wiki.

For what it's worth, it seems Toolserver did include centralauth in the wikis table. Lat week I (finally) removed dead code from GUC that was explicitly filtering out centralauth with WHERE family != 'centralauth' when building a list of all wikis.

It's just that the information is domain specific and doesn't belong in individual tools.

Isn't meta_p.wiki domain specific as well?

For what it's worth, it seems Toolserver did include centralauth in the wikis table.

If we add non-wiki databases to the wiki table then everything that uses the table to iterate/discover wikis will need to know that there is some set of "family" values that refer to non-wiki databases. If we add a new table then we need to advertise and support that.

The meta_p.wiki seems to be fulfilling multiple use cases which makes it a bit inconvenient for all use cases. Part of its purpose seems to be discovery (i.e. "what wikis are available by language or family"). Another is getting some sense of configuration for a given wiki (i.e. the "has_*" flags which are populated from Action API requests). The third is reference information: what is the canonical base URL for the wiki, what database server section (new/old preferred term for the "s" in s1.analytics.db.svc.eqiad.wmflabs) contains the wiki's schema.

I don't know exactly what the "right" solution to this problem is, but I'd like to hear more than a single consumer ask for change before we rush to add a non-wiki database to the table named "wiki". 60% of the responses to the 2017 Toolforge survey said they either never used or never even heard of Toolserver. That makes me cautious to use that past implementation as a guide for what is best to do today. I really wish we had better usage information for this table and other thing so we could do a better job at reaching out to active users to discuss changes. I think we should at least post this task on the cloud mailing list so that a few other people might read it and chime in with whether this change would help or hurt their usage of the data.

I would suggest splitting the table into two: one that has a list of wikis (in the current setup all columns except slice) and and one that lists databases and their properties (dbname, slice). The original wiki table can then be replaced with a view that performs the relevant join to maintain backwards compatibility.

I would suggest splitting the table into two: one that has a list of wikis (in the current setup all columns except slice) and and one that lists databases and their properties (dbname, slice). The original wiki table can then be replaced with a view that performs the relevant join to maintain backwards compatibility.

+1

This might also be a good time to get rid of the '.labsdb' postfix in the 'slice' column (see https://wikitech.wikimedia.org/w/index.php?title=Help_talk:Toolforge/Database&oldid=1791013#Identifying_lag for discussion (and possibly some later edits)).

wikis table:

wikifamilylangdatabase...etc
enwikiwikipediaenenwiki_p...etc

databases table:

databaseshard
meta_plabsdb
enwiki_ps1

This would still require the user to hard-code the '.web.db.svc' postfixes in their code. Alternatively, we could even add

shards table:

shardrealtimeanalytics
labsdb......
s1s1.web.dbs1.analytics.db

or...

shardhostnametypical_replag_smax_runtime_sother parameters
s1s1.web.db.svc330...
s1s1.analytics.db.svc603600...

where a tool can use the parameters to choose the most appropriate database server, which means tools could automatically switch over if a third tier is introduces. But I think this is probably not worth the effort -- most people will just hardcode web or analytics anway, and this would just be a large amount of maintenance work added.

This might also be a good time to get rid of the '.labsdb' postfix in the 'slice' column (see https://wikitech.wikimedia.org/w/index.php?title=Help_talk:Toolforge/Database&oldid=1791013#Identifying_lag for discussion (and possibly some later edits)).

See also T176886: Update meta_p database for new service names