Page MenuHomePhabricator

Family packages
Open, LowestPublic

Description

One of the problems with a pywikibot package is that it will effectively break whenever a new language is added to a family.
One way to reduce the impact of this problem is to allow family files to be distributed separately from the core library.

It would also be nice to allow people to manage their own wiki family class on http://pypi.python.org , which pywikibot loads dynamically.

With PEP420, it is possible to set up a namespace package that doesnt depend on setuptools' pkg_resources and pkgutil , which have problems described in that PEP. PEP420 is Python 3.3+ , however as Family.load is already doing class loader voodoo, and is the only entry point for new family classes, we can probably provide similar functionality on Python 2, and importlib2 might do all of the hard work to provide backwards compatibility.

Also worth mentioning is openstack is moving away from pkg_resources/pkgutil because of their oddities. See http://specs.openstack.org/openstack/oslo-specs/specs/kilo/drop-namespace-packages.html and https://etherpad.openstack.org/p/kilo-oslo-namespace-packages

setuptools doesn't have good support for PEP420 (e.g. https://bitbucket.org/pypa/setuptools/issue/98/having-two-pep-420-implicit-namespace), however pywikibot family needs are simple , and the setuptools problems can be worked around quite easily.

It doesnt seem possible to install new family classes into pywikibot.family from another package, due to pywikibot/__init__.py not being empty. Unless a solution can be found for that, Family.load could load family classes from a new namespace pywikibot_families.

setuptools eggs appear to not be PEP420-able, so to create a PEP420 package using setuptools, the setup call needs to include zip_safe=False, and looks like:

from setuptools import setup

setup(
    name='PywikibotWikimediaFamily',
    version='0.1',
    description='Wikimedia configuration for Pywikibot',
    long_description='Wikimedia configuration for Pywikibot',
    maintainer='The Pywikibot team',
    maintainer_email='pywikibot@lists.wikimedia.org',
    license='MIT License',
    packages=['pywikibot_families', 'pywikibot_families.wikimedia'],
    install_requires='pywikibot',
    url='https://www.mediawiki.org/wiki/Pywikibot',
    classifiers=[
        'License :: OSI Approved :: MIT License',
        'Development Status :: 4 - Beta',
        'Operating System :: OS Independent',
        'Intended Audience :: Developers',
        'Environment :: Console',
        'Programming Language :: Python :: 3.3',
    ],
    use_2to3=False,
    zip_safe=False
)

with family modules in pywikibot/families/wikimedia/ and no other files in the pywikibot/ directory tree.

Event Timeline

jayvdb raised the priority of this task from to Needs Triage.
jayvdb updated the task description. (Show Details)
jayvdb added a project: Pywikibot.
jayvdb updated the task description. (Show Details)
jayvdb set Security to None.
jayvdb added subscribers: XZise, Aklapper, Gallaecio and 2 others.

As for a migration strategy, IMO we could/should

  1. move all the contents pywikibot/families into subdirectories like pywikibot/families/wikimedia , pywikibot/families/wikimedia_test and pywikibot/families/i18n (or translatewiki?), with some small tweaks to config2.py and family.py to recursively find family classes under pywikibot/families/.
  2. the core setup.py would only install the wikimedia_test and i18n family subpackages
  3. a second setup.py (setup_wikimedia_family.py?) would install a package with all the wikimedia families.

There will need to be some more significant changesets before the implementation is stable. Once it is stable, we push the wikimedia family to a new github repo, and create github repos for a few other major groups of wikis, and then delete all of the families in core.

I have an implementation of all the above for Python 3.3+. Just need to tidy it up a bit before pushing up, probably without pywikibot_families support for Python 2.

I currently have some restrictions, so that override precedence can be established, but this change doesnt need to revise/standardise the permissible characters in family names/modules (as there is already patches under review which do that):

  1. if pywikibot_families uses a 'name' already used within pywikibot/families , it replaces the existing family class. i.e. someone creating a package providing pywikibot_families.foo.test_family.py replaces pywikibot/families/test_family.py.
  2. the same name cant be provided by two modules within pywikibot_families. i.e. it is illegal to have pywikibot_families.wikia.roblox_family and pywikibot_families.other.roblox_family, as I cant determine precedence between them, and do not want to allow '.' in the name given to pywikibot.Site(..) in order to allow both to be loaded.

On Python 3.3+, it appears to be possible for a separate package to install family files into pywikibot.families if pywikibot/__init__.py includes:

from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)

That means the pywikibot namespace isnt using PEP420, but it can contain PEP420 namespaces.

This appears to not work in lower versions of Python, as pywikibot/__init__.py can not include other code.
https://www.python.org/dev/peps/pep-0420/#namespace-packages-today says the following regarding pkgutil:

As a consequence, the package's __init__.py cannot practically define any names as it depends on the order of the package fragments on sys.path to determine which portion is imported first.

Anyway on Python 3.3, we can delete pywikibot/families/__init__.py so that the namespace pywikibot.families is using PEP420, even when the pywikibot namespace is not. Works lovely under Python 3.3+ but completely fails in earlier versions.

Setuptools' pkg_resources.declare_namespace is a very large hack that I still dont completely understand, and the complexity involved worries me, and I read a lot of similar warnings that __init__.py can not include other code anyway. Depending on setuptools will probably annoy folks stuck on older systems.

Also, importlib2 says it is Python 2.7+, and hasnt been updated for six months, so might be unmaintained, and definitely doesnt work as-is on Python 2.6:

Traceback (most recent call last):
  File "<string>", line 20, in <module>
  File "/tmp/pip-build-jB5cAM/importlib2/setup.py", line 13, in <module>
    vers = _util.load_version()
  File "_util/__init__.py", line 19, in load_version
    import _version
  File "/tmp/pip-build-jB5cAM/importlib2/importlib2/_version/__init__.py", line 27, in <module>
    VERSION = '{}.{}'.format(PY_VERSION, RELEASE)
ValueError: zero length field name in format

So, I built my own much simpler hack to load the real pywikibot __init__.py to be called from the __init__ from a different distributed package pywikibot is loaded first.

Change 221637 had a related patch set uploaded (by John Vandenberg):
[WIP] Support custom families in pywikibot.families

https://gerrit.wikimedia.org/r/221637

After a bit of playing around, it seems that pkgutil.walk_packages cant be used to find namespace packages within a namespace package.

the following returns 'wikia' and 'wikia.lyricwiki_family'

./pywikibot/__init__.py
./pywikibot/families/wikia/__init__.py
./pywikibot/families/wikia/lyricwiki_family.py

however the following returns nothing

./pywikibot/__init__.py
./pywikibot/families/wikia/lyricwiki_family.py

I cant find a bug for this in http://bugs.python.org/ , and maybe this is even expected behaviour, but I found it to be inconsistent with the intent of PEP 420.

The current patch works on Py2.6+, and achieves my main objective of a) not automatically loading the overlaid subpackages or family classes, and b) the family modules are real modules in sys.modules. It also achieves a less important goal of having a single namespace for all family modules, and that namespace can be extended further by updating the package path.

As a result, there is very little performance lost in splitting the family classes into lots of family packages, and having family classes for lots of wikis.
Family.load does iterate over all family modules to determine the list of valid Family names (and which subpackage it is found in), which is a performance hit, however that can easily be changed so that it attempts to loads the family by name, so that family modules at the top level (or in subpackage if/when we permit '.' in the name) and catches the import error to indicate it needs to scan all family packages.

Another way to avoid scanning all family package is for each family subpackage to include a directory of family modules/classes it contains. This directory could also help speed up our from_url algorithm, as the wildcarded domains could also be included, so from_url could skip loading all modules when the subpackage domain list indicates the modules therein wont help resolve the URL. This will be especially helpful for large wiki farms which put each wiki on a subdomain, as one wildcard entry would be sufficient to exclude all of the families in that wikifarm. (e.g. the majority of Wikia wikis).

Properly loading packages using ordinary import will resolve a Pywikibot-Documentation problem
https://integration.wikimedia.org/ci/job/tox-doc-trusty/1073/consoleFull

23:08:00 /mnt/jenkins-workspace/workspace/tox-doc-trusty/docs/api_ref/pywikibot.families.rst:159: WARNING: autodoc: failed to import module 'pywikibot.families.wikimedia_family'; the following exception was raised:
23:08:00 Traceback (most recent call last):
23:08:00   File "/mnt/jenkins-workspace/workspace/tox-doc-trusty/.tox/doc/lib/python3.4/site-packages/sphinx/ext/autodoc.py", line 385, in import_object
23:08:00     __import__(self.modname)
23:08:00 ImportError: No module named 'pywikibot.families.wikimedia_family'
Xqt triaged this task as Lowest priority.Oct 30 2016, 11:48 AM
Xqt removed jayvdb as the assignee of this task.Feb 19 2020, 7:03 AM