Page MenuHomePhabricator

Provide wiki metadata in the databases similar to toolserver.wiki
Closed, ResolvedPublic

Description

Toolserver has the local table toolserver.wiki on all databases that provides metadata about the wikis including the server the wiki's database is kept on:

| mysql> SELECT * FROM toolserver.wiki LIMIT 5;
| +----------------+------+------------+------------------+------+---------+-----------+--------------+--------------+---------------+--------+-------------+
| | dbname         | lang | family     | domain           | size | is_meta | is_closed | is_multilang | is_sensitive | root_category | server | script_path |
| +----------------+------+------------+------------------+------+---------+-----------+--------------+--------------+---------------+--------+-------------+
| | aawikibooks_p  | aa   | wikibooks  | NULL             |    3 |       0 |         1 |            0 |            0 | NULL          |      3 | /w/         |
| | aawiki_p       | aa   | wikipedia  | NULL             |    6 |       0 |         1 |            0 |            0 | NULL          |      3 | /w/         |
| | aawiktionary_p | aa   | wiktionary | NULL             |    1 |       0 |         1 |            0 |            1 | NULL          |      3 | /w/         |
| | abwiki_p       | ab   | wikipedia  | ab.wikipedia.org |  807 |       0 |         0 |            0 |            0 | NULL          |      3 | /w/         |
| | abwiktionary_p | ab   | wiktionary | NULL             |    0 |       0 |         1 |            0 |            1 | NULL          |      3 | /w/         |
| +----------------+------+------------+------------------+------+---------+-----------+--------------+--------------+---------------+--------+-------------+
| 5 rows in set (0.00 sec)

| mysql>

Most of the information can probably be extracted from operations/mediawiki-config, but I don't know which sources there are authoritative.

See also: T50625: Provide namespace IDs and names in the databases similar to toolserver.namespace

Details

Reference
bz48626

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:18 AM
bzimport added a project: Toolforge.
bzimport set Reference to bz48626.

Played around with:

include ($MediaWikiRepoPath . "/includes/Defines.php");
include ($WmfConfigRepoPath . "/wmf-config/InitialiseSettings.php");
var_dump ($wgConf->settings);

but it doesn't yield for example information about de.wikipedia.org.

(In reply to comment #1)

Played around with:

include ($MediaWikiRepoPath . "/includes/Defines.php");
include ($WmfConfigRepoPath . "/wmf-config/InitialiseSettings.php");
var_dump ($wgConf->settings);

but it doesn't yield for example information about de.wikipedia.org.

Some experiments:

$ php maintenance/eval.php

$wgDBname='zhwiki';

$wmfRealm='production';

$mwConfigDir="$IP/../operations/mediawiki-config";

$wmfConfigDir="$mwConfigDir/wmf-config";

function getRealmSpecificFilename($p){global $IP,$wmfConfigDir;return str_replace($p,$IP,$wmfConfigDir);}

function wmfLoadInitialiseSettings($c){global $wmfConfigDir;require("$wmfConfigDir/InitialiseSettings.php");}

require("$wmfConfigDir/wgConf.php");

list($site,$lang)=$wgConf->siteFromDB($wgDBname);

$wikiTags=array();

$mwConfigDirHandle=opendir($mwConfigDir);

while(($f=readdir($mwConfigDirHandle))!==false){if(pathinfo($f,PATHINFO_EXTENSION)==='dblist'&&in_array($wgDBname,array_map('trim',file("$mwConfigDir/$f")))){$wikiTags[]=pathinfo($f,PATHINFO_FILENAME);}}

$dbSuffix = ( $site === 'wikipedia' ) ? 'wiki' : $site;

$wgConf->loadFullData();

$globals = $wgConf->getAll( $wgDBname, $dbSuffix,array('lang' => $lang,'site' => $site,'stdlogo' => "//upload.wikimedia.org/$site/$lang/b/bc/Wiki.png"), $wikiTags );

print_r($globals);

Array
(

[wgLegacyEncoding] => 
[wgCapitalLinks] => 1
...

)

Do we want a database table consisting of three columns: wiki, config_variable_name, and config_variable_value (as a serialized blob)?

I think we should have a discussion about what the current "toolserver" database is, what we want in the future, and whether we care about breaking backward compatibility.

Some of the design decisions in some of the database tables could probably be re-thought, but only if we're willing to break the current interfaces.

In addition, I think we should only rely on MediaWiki's API for this information (with user authentication, as necessary). This is the cleanest and sanest way to accurately get this information, as far as I know.

(In reply to comment #4)

In addition, I think we should only rely on MediaWiki's API for this
information (with user authentication, as necessary).

This is particularly important in that some extensions may have hard-to-evaluate effect on some configuration values (namespaces and usergroups being the more obvious cases).

I should say that any necessary configuration value that cannot be fetched through the API should be /added/ to the API rather than fetched through an alternative scheme.

  • Marc

API is per wiki. toolserver.wiki is a meta table.

Yes, but you need to populate that table from /somewhere/. :-)

I've added a table with automatically maintained meta information
about the replicated databases: meta_p.wiki (which is available on every
shard).

+------------------+--------------+------+-----+---------+-------+

FieldTypeNullKeyDefaultExtra

+------------------+--------------+------+-----+---------+-------+

dbnamevarchar(32)NOPRINULL
langvarchar(12)NOen
nametextYESNULL
familytextYESNULL
urltextYESNULL
sizedecimal(1,0)NO1
slicetextNONULL
is_closeddecimal(1,0)NO0
has_echodecimal(1,0)NO0
has_flaggedrevsdecimal(1,0)NO0
has_visualeditordecimal(1,0)NO0
has_wikidatadecimal(1,0)NO0

+------------------+--------------+------+-----+---------+-------+

There is a lingering issue with the 'name' column which seems to
improperly encode the Wiki name when non-ascii characters are involved;
that will get fix once I manage to beat some sense into mysql.

Most columns are self-explanatory, and I can add a few more depending on
demand. In the meantime, (dbname, slice) provides the much requested
mapping between databases and slices.

decimal(1,0) ? This seems strange. Shouldn't those is_* and has_* be BOOL aka. TINYINT(1) ?

I did not want to rely on the existence of bool, which isn't ANSI; mysql "helpfully" translated my numeric(1) to decimal(1,0).

Would be a problem to rename slice to server, in order to match the column name of toolserver?

The name column looks good to me from a quick look, btw.

It would be possible, but probably unhelpful: from what I understand, the server column is numeric whereas I provide actual host names. Keeping the column named the same with changed semantics seems to be asking for trouble IMO (i.e.: better a select fails than return a string that is misinterpreted as an integer by code with poor error checking).

Added a meta_p.legacy view that has the same column name and order as toolserver.wiki for legacy purposes.

Please note that the semantics of the 'server' columns differs and there may be other subtle differences with the toolserver's table not immediately evident. Unless the same code base has to run on both labs and the toolserver for the interval while it still has replication, transitioning to use meta_p.wiki is preferable.