Page MenuHomePhabricator

generate_family_file.py fails on wiki-en.genealogy.net
Closed, ResolvedPublic

Description

$ python ./generate_family_file.py http://wiki-en.genealogy.net/Main_Page gene
Generating family file from http://wiki-en.genealogy.net/Main_Page
http://wiki-en.genealogy.net/Main_Page
Traceback (most recent call last):
  File "./generate_family_file.py", line 319, in <module>
    FamilyFileGenerator(*sys.argv[1:]).run()
  File "./generate_family_file.py", line 95, in run
    w = Wiki(self.base_url)
  File "./generate_family_file.py", line 242, in __init__
    uo = urlopen(fromurl)
  File "./generate_family_file.py", line 47, in urlopen
    uo = urllib2.urlopen(req)
  File "/usr/lib64/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib64/python2.7/urllib2.py", line 410, in open
    response = meth(req, response)
  File "/usr/lib64/python2.7/urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib64/python2.7/urllib2.py", line 448, in error
    return self._call_chain(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 504: Gateway Time-out

It works a bit better on wiki-de.genealogy.net

$ python ./generate_family_file.py http://wiki-de.genealogy.net/Main_Page gene
Generating family file from http://wiki-de.genealogy.net/Main_Page
http://wiki-de.genealogy.net/Main_Page

==================================
api url: http://wiki-de.genealogy.net/w/api.php
MediaWiki version: 1.14.1
==================================

Determining other languages...http://wiki-de.genealogy.net/w/api.php?action=query&meta=siteinfo&siprop=interwikimap&sifilteriw=local&format=json
de en nl sv

There are 4 languages available.
Do you want to generate interwiki links? This might take a long time. ([y]es/[N]o/[e]dit)y
Loading wikis... 
  * de... in cache
  * en... http://wiki-en.genealogy.net/
HTTP Error 504: Gateway Time-out
  * nl... http://wiki-nl.genealogy.net/wiki/
downloaded
  * sv... http://wiki-sv.genealogy.net/
HTTP Error 500: Internal Server Error
Writing pywikibot/families/gene_family.py... 


The resulting family file only includes nl and de

# -*- coding: utf-8 -*-
"""
This family file was auto-generated by $Id: 185033971c163ea46b2b1904773b8c407069a4d0 $
Configuration parameters:
  url = http://wiki-de.genealogy.net/Main_Page
  name = gene

Please do not commit this to the Git repository!
"""

from pywikibot import family

class Family(family.Family):
    def __init__(self):
        family.Family.__init__(self)
        self.name = 'gene'
        self.langs = {
            'nl': 'wiki-nl.genealogy.net',
            'de': 'wiki-de.genealogy.net',
        }



    def scriptpath(self, code):
        return {
            'nl': '/w',
            'de': '/w',
        }[code]

    def version(self, code):
        return {
            'nl': u'1.14.1',
            'de': u'1.14.1',
        }[code]

http://wiki-en.genealogy.net/w/api.php and http://wiki-en.genealogy.net/w/api.php fail

http://wiki-sv.genealogy.net/w/api.php works for me

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:55 AM
bzimport set Reference to bz72846.
bzimport added a subscriber: Unknown Object (????).
jayvdb set Security to None.
jayvdb updated the task description. (Show Details)
jayvdb claimed this task.

WFM.
Almost certainly fixed by switch to requests, and/or host server being 'better'.

jayvdb updated the task description. (Show Details)

Well, it was fixed, but now I've caused a regression, and there are some other oddities to investigate when multiple languages are enabled.

Change 235660 had a related patch set uploaded (by John Vandenberg):
Fix HTML regex version detection

https://gerrit.wikimedia.org/r/235660

Change 235662 had a related patch set uploaded (by John Vandenberg):
Multi site family detection regression

https://gerrit.wikimedia.org/r/235662

Change 235662 merged by jenkins-bot:
Multi site family detection regression

https://gerrit.wikimedia.org/r/235662

Strange I don't see a difference between the two URLs that fail.

Change 235660 merged by jenkins-bot:
Fix HTML regex version detection

https://gerrit.wikimedia.org/r/235660

Strange I don't see a difference between the two URLs that fail.

They fail in different ways when accessing sv.

"sv" has been removed from the interwikimap on "en": http://wiki-en.genealogy.net/w/api.php?action=query&meta=siteinfo&siprop=interwikimap

"sv" still exists in "de", which now looks much better:

$ python ./generate_family_file.py http://wiki-de.genealogy.net/Main_Page gene
Generating family file from http://wiki-de.genealogy.net/Main_Page
WARNING: Http response status 404

==================================
api url: http://wiki-de.genealogy.net/w/api.php
MediaWiki version: 1.14.1
==================================

Determining other languages...de en nl sv

There are 4 languages available.
Do you want to generate interwiki links?This might take a long time. ([y]es/[N]o/[e]dit)y
Loading wikis... 
  * de... in cache
  * en... downloaded
  * nl... downloaded
WARNING: Http response status 500
  * sv... Unsupported url: http://wiki-sv.genealogy.net/Huvudsida
Writing pywikibot/families/gene_family.py... 
pywikibot/families/gene_family.py already exists. Overwrite? (y/n)y

http://wiki-sv.genealogy.net/Huvudsida now is:

GenWiki has a problem

Sorry! This site is experiencing technical difficulties.

Try waiting a few minutes and reloading.

(Can't contact the database server: Access denied for user 'wiki_sv'@'%' to database 'wiki_commons' (mysql.genealogy.net))