Page MenuHomePhabricator

namespace should have it's own XML tag
Closed, ResolvedPublic

Description

Currently, the namespace and article title are merged into a single tag, but it would make life easier to have a separate <namespace> tag.


Version: unspecified
Severity: enhancement

Details

Reference
bz27775

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:27 PM
bzimport set Reference to bz27775.

I will briefly expand on this. Right now if you want to determine whether an article belongs to the main namespace, you need to rule out that it does not belong to any other namespace. So, you iterate over all the local names of the namespace and make sure that the title of the article does not match to the namespace. If none of the namespaces match then you can conclude the article belongs to the main namespace. So this is a lot of extra work and a separate <namespace>0</namespace> tag would be ideal.

I agree. The text matching currently necessary doesn't have to be there. But besides the suggested <namespace> tag I would suggest also the more concise <ns> tag, or even better, just add an attribute either "ns" or "namespace" to the <title> tag.

Created attachment 8963
This patch adds a new tag <ns> to a <page> tag.

Attached:

Just a note to say please make sure that the XML dump version number is bumped at the same time a dump feature is added so that dump parsers that need to work with all dump versions can enable support for features based on the version number. It can make the code faster and the version number wasn't changed when the <redirect> tag was added.

sumanah wrote:

Added the "patch" and "need-review" keywords; Mark hopes to get someone to review the patch soon.

This patch looks ok to me.

Bear in mind that it's possible for the namespaces to change in the middle of a run, for example if a custom namespace is added to accomodate content that the community wishes to move out of the main namespace. That won't happen often but dump users will probably get bitten by it once in awhile.