The XML dump contains a siteinfo header with a <namespaces> tag that is very useful for processing the text in the dumps. It looks something like this:
<mediawiki ...snip... > <siteinfo> <sitename>Վիքիպեդիա</sitename> <base>http://hy.wikipedia.org/wiki/%D4%B3%D5%AC%D5%AD%D5%A1%D5%BE%D5%B8%D6%80_%D5%A7%D5%BB</base> <generator>MediaWiki 1.23wmf15</generator> <case>first-letter</case> <namespaces> <namespace key="-2" case="first-letter">Մեդիա</namespace> <namespace key="-1" case="first-letter">Սպասարկող</namespace> <namespace key="0" case="first-letter" /> <namespace key="1" case="first-letter">Քննարկում</namespace> <namespace key="2" case="first-letter">Մասնակից</namespace> ...snip... </namespaces> </siteinfo>
Regretfully, this header does not include canonical namespace names or namespace aliases. However, an API request for "meta=siteinfo" does include these bits. For example, the call for http://hy.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces|namespacealiases returns the following XML:
<api> <query> <namespaces> <ns id="-2" case="first-letter" canonical="Media" xml:space="preserve">Մեդիա</ns> <ns id="-1" case="first-letter" canonical="Special" xml:space="preserve">Սպասարկող</ns> <ns id="0" case="first-letter" content="" xml:space="preserve" /> <ns id="1" case="first-letter" subpages="" canonical="Talk" xml:space="preserve">Քննարկում</ns> <ns id="2" case="first-letter" subpages="" canonical="User" xml:space="preserve">Մասնակից</ns> ...snip... </namespaces> <namespacealiases> <ns id="6" xml:space="preserve">Image</ns> <ns id="7" xml:space="preserve">Image talk</ns> </namespacealiases> </query> </api>
The XML dump should be updated to include this important metadata about namespaces.
Version: 1.23.0
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=40010