Page MenuHomePhabricator

Let bots use canonical namespaces
Closed, DeclinedPublic

Description

Originally from: http://sourceforge.net/p/pywikipediabot/feature-requests/332/
Reported by: pathoschild
Created on: 2013-02-23 02:45:38
Subject: Let bots use canonical namespaces
Original description:
Let bots use canonical namespaces instead of translations from the family files. This is useful for crosswiki bots, where invalid namespace names may not be detected by the operator \(most recently \[1\]\[2\]\).

The attached patch implements this by adding an optional constructor argument to Page. For example, the current behaviour is unchanged:
ns = wikipedia.Page\(site, title\).namespaceName\(\) \# Utilisateur:Pathoschild
But a constructor argument enables canonical namespaces:
ns = wikipedia.Page\(site, title, translateNamespace=False\).namespaceName\(\) \# User:Pathoschild

\[1\] http://meta.wikimedia.org/wiki/User\_talk:Pathoschild?oldid=5269904\#Probl.C3.A8me\_avec\_ton\_bot\_sur\_Wikinews\_portugais
\[2\] http://meta.wikimedia.org/wiki/User\_talk:Pathoschild?oldid=5269904\#Polish\_Wikivoyage\_user\_js\_files


Version: compat-(1.0)
Severity: enhancement
See Also:
https://sourceforge.net/p/pywikipediabot/feature-requests/332

Details

Reference
bz55014

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:14 AM
bzimport set Reference to bz55014.
bzimport added a subscriber: Unknown Object (????).

Patch which enables canonical namespace support.

The patch is against compat, but core has the same problem.

Core now has a Namespace class, which captures the canonical namespaces data of the wiki. Relevant code at:
https://git.wikimedia.org/blob/pywikibot%2Fcore.git/9b67ec1424d160d1968ef3f3da9f179675d68070/pywikibot%2Fsite.py#L131
https://git.wikimedia.org/blob/pywikibot%2Fcore.git/9b67ec1424d160d1968ef3f3da9f179675d68070/pywikibot%2Fsite.py#L577

The Namespace class isnt utilised in the old link parsing algorithms. There are a few core changesets related to this problem:
https://gerrit.wikimedia.org/r/#/c/150872/ - Link normalization
https://gerrit.wikimedia.org/r/#/c/148337/ - force using namespace param; ignore namespace in title

Aklapper triaged this task as Lowest priority.Jun 5 2015, 1:41 PM
Aklapper subscribed.

Pywikibot has two versions: Compat and Core. This task was filed about the older version, called Pywikibot-compat, which is not under active development anymore. Hence I'm lowering the priority of this task to reflect the reality. Unfortunately, the Pywikibot team does not have the manpower to retest every single bug report / feature request against the (maintained) Pywikibot code base. Furthermore, the code base of Pywikibot-Compat has changed a lot compared to the code base of Pywikibot-Core so there is a chance that the problem described in this task might not exist anymore. Please help: Unfortunately manpower is limited and does not allow testing every single reported task again. If you have time and interest in Pywikibot, please upgrade to Pywikibot-Core and add a comment to this task if the problem in this task still happens in Pywikibot-Core (or directly edit the task by removing the Pywikibot-compat project and adding the Pywikibot project to this task). To learn more about Pywikibot and to get involved in its development, please check out https://www.mediawiki.org/wiki/Manual:Pywikibot/Development Thank you for your understanding.

This is an issue in core too as @jayvdb mentioned above. However I've moved away from using pywikibot, so I don't personally need this anymore.

@Pathoschild, what are you using now?

If I understand correctly, this bot was about outputting canonical namespaces , and possibly even parsing the title using only canonical namespaces.

@jayvdb This patch addressed an issue which affects crosswiki bots like Synchbot. Since the family files aren't always up to date, the bot would sometimes add user pages to the main namespace (because pywikibot replaced "User:" with an outdated namespace translation not recognised by MediaWiki). This didn't happen often, but it was difficult to notice because the bot edits so many wikis. The patch let you optionally use the canonical namespace which MediaWiki automatically translates, eliminating the possibility of outdated translations.

I previously used pywikibot for Synchbot, but I needed to maintain a custom version adapted for crosswiki work (to update outdated family files, fix this namespace issue, suppress is-sysop checks, and a few other changes). I also ran into some significant stability issues when I switched from compat to core (though it may have improved since then). Eventually I moved to the low-level mwclient with some code to fetch wikis from the API sitematrix, which eliminated manual maintenance.

Well, the underlying outdated family file namespaces problem is definitely killed in core - it dynamically fetches the namespaces from siteinfo. However i'd need to refresh myself on the caching used for namespaces to confirm if the problem hasnt just been reduced to be much less frequently occurring but still possible. It would still be possible to completely eliminating this with an explicit siteinfo namespace refresh before doing any edits (i.e. one line of code in your script) - everything after that will use the dynamic obtained namespaces data.

Also both compat and core now have ways to specify the namespace of a Page using a namespace number, which is a much saner approach that just using a title (with namespace names and colons embedded). 'core' does have some bugs wrt to titles/ns , but they are very very minor ones that can usually be avoided with explicit parameters , and it does have an active developer community which does high quality code reviews to keep good code getting merged frequently and 750 unit tests. (i.e. please come back ;-))

Great! It sounds like we can close this ticket. Synchbot is running fine on mwclient, but I'll look into pywikibot again as a possible future change. :)

Honestly I hope Synchbot is made unnecessary as soon as possible.

That too! Though we'll still need it for a while yet to clean up local pages for the transition to global accounts and user pages.

Xqt subscribed.

Won't fix in compat. Use core instead. Namespaces in compat wheren't updated since 11 months from now.