Jouncebot failing to parse Deployments page
Closed, ResolvedPublic

Description

So I kind of messed up when testing a patch and upgraded the libraries that Jouncebot was using. Now things go BOOM! when it tries to read the current Deployments page.

ERROR:root:Unhandled exception. Terminating.
Traceback (most recent call last):
  File "./jouncebot/jouncebot.py", line 292, in <module>
    bot.start()
  File "/mnt/nfs/labstore-secondary-tools-project/jouncebot/virtenv/local/lib/python2.7/site-packages/irc/bot.py", line 325, in start
    super(SingleServerIRCBot, self).start()
  File "/mnt/nfs/labstore-secondary-tools-project/jouncebot/virtenv/local/lib/python2.7/site-packages/irc/client.py", line 1247, in start
    self.reactor.process_forever()
  File "/mnt/nfs/labstore-secondary-tools-project/jouncebot/virtenv/local/lib/python2.7/site-packages/irc/client.py", line 271, in process_forever
    consume(infinite_call(one))
  File "/mnt/nfs/labstore-secondary-tools-project/jouncebot/virtenv/local/lib/python2.7/site-packages/more_itertools/recipes.py", line 134, in consume
    deque(iterator, maxlen=0)
  File "/mnt/nfs/labstore-secondary-tools-project/jouncebot/virtenv/local/lib/python2.7/site-packages/jaraco/itertools.py", line 386, in <genexpr>
    return (f() for _ in itertools.repeat(None))
  File "/mnt/nfs/labstore-secondary-tools-project/jouncebot/virtenv/local/lib/python2.7/site-packages/irc/client.py", line 252, in process_once
    self.process_data(i)
  File "/mnt/nfs/labstore-secondary-tools-project/jouncebot/virtenv/local/lib/python2.7/site-packages/irc/client.py", line 216, in process_data
    c.process_data()
  File "/mnt/nfs/labstore-secondary-tools-project/jouncebot/virtenv/local/lib/python2.7/site-packages/irc/client.py", line 580, in process_data
    self._process_line(line)
  File "/mnt/nfs/labstore-secondary-tools-project/jouncebot/virtenv/local/lib/python2.7/site-packages/irc/client.py", line 612, in _process_line
    handler(arguments, command, source, tags)
  File "/mnt/nfs/labstore-secondary-tools-project/jouncebot/virtenv/local/lib/python2.7/site-packages/irc/client.py", line 661, in _handle_other
    self._handle_event(event)
  File "/mnt/nfs/labstore-secondary-tools-project/jouncebot/virtenv/local/lib/python2.7/site-packages/irc/client.py", line 671, in _handle_event
    self.reactor._handle_event(self, event)
  File "/mnt/nfs/labstore-secondary-tools-project/jouncebot/virtenv/local/lib/python2.7/site-packages/irc/client.py", line 395, in _handle_event
    result = handler.callback(connection, event)
  File "/mnt/nfs/labstore-secondary-tools-project/jouncebot/virtenv/local/lib/python2.7/site-packages/irc/client.py", line 1210, in _dispatcher
    method(connection, event)
  File "./jouncebot/jouncebot.py", line 83, in on_welcome
    self.deploy_page.start(self.on_deployment_event)
  File "/mnt/nfs/labstore-secondary-tools-project/jouncebot/jouncebot/deploypage.py", line 47, in start
    self._reparse_on_timer()
  File "/mnt/nfs/labstore-secondary-tools-project/jouncebot/jouncebot/deploypage.py", line 134, in _reparse_on_timer
    self.reparse(set_timer=True)
  File "/mnt/nfs/labstore-secondary-tools-project/jouncebot/jouncebot/deploypage.py", line 72, in reparse
    self._get_page_html(), lxml.etree.HTMLParser())
  File "src/lxml/lxml.etree.pyx", line 3213, in lxml.etree.fromstring (src/lxml/lxml.etree.c:79010)
  File "src/lxml/parser.pxi", line 1848, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:118341)
  File "src/lxml/parser.pxi", line 1736, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:117021)
  File "src/lxml/parser.pxi", line 1102, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:111265)
  File "src/lxml/parser.pxi", line 595, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:105109)
  File "src/lxml/parser.pxi", line 706, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:106817)
  File "src/lxml/parser.pxi", line 646, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:105963)
XMLSyntaxError: None (line 0)
bd808 created this task.Feb 21 2017, 11:18 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 21 2017, 11:18 PM

Here's the real problem:

2017-02-21 16:36:01,720 - ERROR - Could not fetch page due to exception: HTTPError(u'414 Client Error: Request-URI Too Long for url: https://wikitech.wikimedia.org/w/api.php?action=parse&text=...lots and lots of wikitext here...

mwclient.client.parse() is doing a GET request instead of a POST since https://github.com/mwclient/mwclient/commit/1d6177022baab6995e4bfccb37717f77e72b1872. Upgrading the library melted things.

Change 339092 had a related patch set uploaded (by BryanDavis):
Use POST when fetching parsed HTML for wikitext

https://gerrit.wikimedia.org/r/339092

Change 339092 merged by jenkins-bot:
Use POST when fetching parsed HTML for wikitext

https://gerrit.wikimedia.org/r/339092

Mentioned in SAL (#wikimedia-labs) [2017-02-22T00:26:31Z] <bd808> Deployed {{gerrit|339092}} for T158715

bd808 closed this task as Resolved.Feb 22 2017, 12:27 AM
bd808 claimed this task.
Restricted Application added a project: User-bd808. · View Herald TranscriptFeb 22 2017, 12:27 AM

Fixed in mwclient 0.8.4