Provide wiki metadata in the databases similar to toolserver.wiki
Closed, ResolvedPublic

Description

Toolserver has the local table toolserver.wiki on all databases that provides metadata about the wikis including the server the wiki's database is kept on:

| mysql> SELECT * FROM toolserver.wiki LIMIT 5;
| +----------------+------+------------+------------------+------+---------+-----------+--------------+--------------+---------------+--------+-------------+
| | dbname         | lang | family     | domain           | size | is_meta | is_closed | is_multilang | is_sensitive | root_category | server | script_path |
| +----------------+------+------------+------------------+------+---------+-----------+--------------+--------------+---------------+--------+-------------+
| | aawikibooks_p  | aa   | wikibooks  | NULL             |    3 |       0 |         1 |            0 |            0 | NULL          |      3 | /w/         |
| | aawiki_p       | aa   | wikipedia  | NULL             |    6 |       0 |         1 |            0 |            0 | NULL          |      3 | /w/         |
| | aawiktionary_p | aa   | wiktionary | NULL             |    1 |       0 |         1 |            0 |            1 | NULL          |      3 | /w/         |
| | abwiki_p       | ab   | wikipedia  | ab.wikipedia.org |  807 |       0 |         0 |            0 |            0 | NULL          |      3 | /w/         |
| | abwiktionary_p | ab   | wiktionary | NULL             |    0 |       0 |         1 |            0 |            1 | NULL          |      3 | /w/         |
| +----------------+------+------------+------------------+------+---------+-----------+--------------+--------------+---------------+--------+-------------+
| 5 rows in set (0.00 sec)

| mysql>

Most of the information can probably be extracted from operations/mediawiki-config, but I don't know which sources there are authoritative.

See also: T50625: Provide namespace IDs and names in the databases similar to toolserver.namespace

Details

Reference
bz48626
bzimport raised the priority of this task from to Normal.
bzimport set Reference to bz48626.
scfc created this task.May 20 2013, 4:35 AM
scfc added a comment.May 25 2013, 2:58 AM

Played around with:

include ($MediaWikiRepoPath . "/includes/Defines.php");
include ($WmfConfigRepoPath . "/wmf-config/InitialiseSettings.php");
var_dump ($wgConf->settings);

but it doesn't yield for example information about de.wikipedia.org.

(In reply to comment #1)

Played around with:

include ($MediaWikiRepoPath . "/includes/Defines.php");
include ($WmfConfigRepoPath . "/wmf-config/InitialiseSettings.php");
var_dump ($wgConf->settings);

but it doesn't yield for example information about de.wikipedia.org.

Some experiments:

$ php maintenance/eval.php

$wgDBname='zhwiki';

$wmfRealm='production';

$mwConfigDir="$IP/../operations/mediawiki-config";

$wmfConfigDir="$mwConfigDir/wmf-config";

function getRealmSpecificFilename($p){global $IP,$wmfConfigDir;return str_replace($p,$IP,$wmfConfigDir);}

function wmfLoadInitialiseSettings($c){global $wmfConfigDir;require("$wmfConfigDir/InitialiseSettings.php");}

require("$wmfConfigDir/wgConf.php");

list($site,$lang)=$wgConf->siteFromDB($wgDBname);

$wikiTags=array();

$mwConfigDirHandle=opendir($mwConfigDir);

while(($f=readdir($mwConfigDirHandle))!==false){if(pathinfo($f,PATHINFO_EXTENSION)==='dblist'&&in_array($wgDBname,array_map('trim',file("$mwConfigDir/$f")))){$wikiTags[]=pathinfo($f,PATHINFO_FILENAME);}}

$dbSuffix = ( $site === 'wikipedia' ) ? 'wiki' : $site;

$wgConf->loadFullData();

$globals = $wgConf->getAll( $wgDBname, $dbSuffix,array('lang' => $lang,'site' => $site,'stdlogo' => "//upload.wikimedia.org/$site/$lang/b/bc/Wiki.png"), $wikiTags );

print_r($globals);

Array
(

[wgLegacyEncoding] => 
[wgCapitalLinks] => 1
...

)

Do we want a database table consisting of three columns: wiki, config_variable_name, and config_variable_value (as a serialized blob)?

I think we should have a discussion about what the current "toolserver" database is, what we want in the future, and whether we care about breaking backward compatibility.

Some of the design decisions in some of the database tables could probably be re-thought, but only if we're willing to break the current interfaces.

In addition, I think we should only rely on MediaWiki's API for this information (with user authentication, as necessary). This is the cleanest and sanest way to accurately get this information, as far as I know.

coren added a comment.Jun 2 2013, 7:50 PM

(In reply to comment #4)

In addition, I think we should only rely on MediaWiki's API for this
information (with user authentication, as necessary).

This is particularly important in that some extensions may have hard-to-evaluate effect on some configuration values (namespaces and usergroups being the more obvious cases).

I should say that any necessary configuration value that cannot be fetched through the API should be /added/ to the API rather than fetched through an alternative scheme.

  • Marc

API is per wiki. toolserver.wiki is a meta table.

coren added a comment.Jul 12 2013, 9:33 PM

Yes, but you need to populate that table from /somewhere/. :-)

coren added a comment.Aug 27 2013, 8:40 PM

I've added a table with automatically maintained meta information
about the replicated databases: meta_p.wiki (which is available on every
shard).

+------------------+--------------+------+-----+---------+-------+

FieldTypeNullKeyDefaultExtra

+------------------+--------------+------+-----+---------+-------+

dbnamevarchar(32)NOPRINULL
langvarchar(12)NOen
nametextYESNULL
familytextYESNULL
urltextYESNULL
sizedecimal(1,0)NO1
slicetextNONULL
is_closeddecimal(1,0)NO0
has_echodecimal(1,0)NO0
has_flaggedrevsdecimal(1,0)NO0
has_visualeditordecimal(1,0)NO0
has_wikidatadecimal(1,0)NO0

+------------------+--------------+------+-----+---------+-------+

There is a lingering issue with the 'name' column which seems to
improperly encode the Wiki name when non-ascii characters are involved;
that will get fix once I manage to beat some sense into mysql.

Most columns are self-explanatory, and I can add a few more depending on
demand. In the meantime, (dbname, slice) provides the much requested
mapping between databases and slices.

decimal(1,0) ? This seems strange. Shouldn't those is_* and has_* be BOOL aka. TINYINT(1) ?

coren added a comment.Aug 27 2013, 8:47 PM

I did not want to rely on the existence of bool, which isn't ANSI; mysql "helpfully" translated my numeric(1) to decimal(1,0).

Would be a problem to rename slice to server, in order to match the column name of toolserver?

The name column looks good to me from a quick look, btw.

coren added a comment.Aug 27 2013, 8:50 PM

It would be possible, but probably unhelpful: from what I understand, the server column is numeric whereas I provide actual host names. Keeping the column named the same with changed semantics seems to be asking for trouble IMO (i.e.: better a select fails than return a string that is misinterpreted as an integer by code with poor error checking).

coren added a comment.Aug 28 2013, 7:44 PM

Added a meta_p.legacy view that has the same column name and order as toolserver.wiki for legacy purposes.

Please note that the semantics of the 'server' columns differs and there may be other subtle differences with the toolserver's table not immediately evident. Unless the same code base has to run on both labs and the toolserver for the interval while it still has replication, transitioning to use meta_p.wiki is preferable.

valhallasw updated the task description. (Show Details)Jan 18 2015, 2:12 PM
valhallasw set Security to None.
Ricordisamoa added a subscriber: Ricordisamoa.
jeremyb added a subscriber: jeremyb.Mar 9 2015, 7:16 AM