Page MenuHomePhabricator

generate_family_file creates invalid file for wikidata and commons
Closed, DeclinedPublic

Description

If "generate interwiki links" is enabled when generating for wikidata or commons, all wikipedia sites are added to self.langs

$ python pwb.py generate_family_file.py https://commons.wikimedia.org/wiki/Main_Page commons_generated
Generating family file from https://commons.wikimedia.org/wiki/Main_Page

==================================
api url: https://commons.wikimedia.org/w/api.php
MediaWiki version: 1.25wmf12
==================================

Determining other languages...aa ab ace af ak als am an ang ar arc arz as ast av ay az ba bar bat-smg bcl be be-tarask be-x-old bg bh bi bjn bm bn bo bpy br bs bug bxr ca cbk-zam cdo ce ceb ch cho chr chy ckb co cr crh cs csb cu cv cy da de diq dsb dv dz ee egl el eml en eo es et eu ext fa ff fi fiu-vro fj fo fr frp frr fur fy ga gag gan gd gl glk gn got gsw gu gv ha hak haw he hi hif ho hr hsb ht hu hy hz ia id ie ig ii ik ilo io is it iu ja jbo jv ka kaa kab kbd kg ki kj kk kl km kn ko koi kr krc ks ksh ku kv kw ky la lad lb lbe lez lg li lij lmo ln lo lt ltg lv lzh mai map-bms mdf mg mh mhr mi min mk ml mn mo mr mrj ms mt mus mwl my myv mzn na nah nan nap nb nds nds-nl ne new ng nl nn no nov nrm nso nv ny oc om or os pa pag pam pap pcd pdc pfl pi pih pl pms pnb pnt ps pt qu rm rmy rn ro roa-rup roa-tara ru rue rup rw sa sah sc scn sco sd se sg sgs sh si simple sk sl sm sn so sq sr srn ss st stq su sv sw szl ta te tet tg th ti tk tl tn to tpi tr ts tt tum tw ty tyv udm ug uk ur uz ve vec vep vi vls vo vro wa war wo wuu xal xh xmf yi yo yue za zea zh zh-classical zh-cn zh-min-nan zh-tw zh-yue zu

There are 301 languages available.
Do you want to generate interwiki links? This might take a long time. ([y]es/[N]o/[e]dit)y
Loading wikis... 
  * aa... downloaded
  * ab... downloaded
  * ace... downloaded
  * af... downloaded
  * ak... downloaded
...
  * zh... downloaded
  * zh-classical... in cache
  * zh-cn... in cache
  * zh-min-nan... downloaded
  * zh-tw... in cache
  * zh-yue... in cache
  * zu... downloaded
  * en... in cache
Writing pywikibot/families/commons_generated_family.py... 

$ head -20 pywikibot/families/commons_generated_family.py 
# -*- coding: utf-8 -*-
"""
This family file was auto-generated by $Id: 2dd21e4aaf7a93cf8749be841552881a80684b52 $
Configuration parameters:
  url = https://commons.wikimedia.org/wiki/Main_Page
  name = commons_generated

Please do not commit this to the Git repository!
"""

from pywikibot import family

class Family(family.Family):
    def __init__(self):
        family.Family.__init__(self)
        self.name = 'commons_generated'
        self.langs = {
            'hu': 'hu.wikipedia.org',
            'vec': 'vec.wikipedia.org',
            'bpy': 'bpy.wikipedia.org',

Event Timeline

jayvdb raised the priority of this task from to Needs Triage.
jayvdb updated the task description. (Show Details)
jayvdb added a project: Pywikibot.
jayvdb added subscribers: Unknown Object (MLST), jayvdb.
valhallasw claimed this task.

I'm going to keep this this way. Generate_family_file.py was never built for complicated setups, and, as evidenced by this example, then breaks. It's not obvious what the correct solution should be, and I think it's reasonable to believe the interwiki map.

This is interwiki_forward = 'wikipedia' mode , which oddly isnt in wikidata_family.py. We need to fix our family file, and we should also fix the generator to detect that all iso codes are being redirected to another wiki family which is _known_.

valhallasw closed this task as Declined.EditedJan 30 2015, 8:33 AM

I disagree. This is outside the scope of the simple sites and families `generate_family_file.py``` is supposed to be used for. If anything, this is a bug in the commons interwiki configuration (as an interwiki map is supposed to be bijective).

The wikidata_family.py issue is a different one.