Page MenuHomePhabricator

mwparserfromhell ParserError on Premier League
Closed, ResolvedPublic

Description

ORES fails to assess revision 106888140, from Premier League 2018-19:

"error": {
  "message": "ParserError: Failed to process datasource.wikitext.revision.wikicode: This is a bug and should be reported. Info: C tokenizer exited with non-empty token stack.\nTraceback (most recent call last):\n  File \"/srv/deployment/ores/venv/lib/python3.5/site-packages/revscoring/dependencies/functions.py\", line 244, in _solve\n    value = dependent(*args)\n  File \"/srv/deployment/ores/venv/lib/python3.5/site-packages/revscoring/dependencies/dependent.py\", line 54, in __call__\n    return self.process(*args, **kwargs)\n  File \"/srv/deployment/ores/venv/lib/python3.5/site-packages/revscoring/features/wikitext/datasources/parsed.py\", line 210, in _process_wikicode\n    return mwparserfromhell.parse(text)\n  File \"/srv/deployment/ores/venv/lib/python3.5/site-packages/mwparserfromhell/utils.py\", line 58, in parse_anything\n    return Parser().parse(value, context, skip_style_tags)\n  File \"/srv/deployment/ores/venv/lib/python3.5/site-packages/mwparserfromhell/parser/__init__.py\", line 93, in parse\n    tokens = self._tokenizer.tokenize(text, context, skip_style_tags)\nmwparserfromhell.parser.ParserError: This is a bug and should be reported. Info: C tokenizer exited with non-empty token stack.\n",
  "type": "CaughtDependencyError"
}

(I haven't checked if it's a bug still present on latest mwparserfromhell development version or it is specific to the version/configuration used on the production hosts)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
import requests, mwparserfromhell as mwp

print('mwph: %s' % mwp.__version__)
r = requests.get('https://es.wikipedia.org/wiki/?oldid=106888140&action=raw')
r.raise_for_status()
mwp.parse(r.text)
print('Parsed OK')
$ python test.py 
mwph: 0.5.1
Parsed OK

What version of mwph is deployed?

This was fixed in mwparserfromhell v0.5 (latest stable is 0.5.1, this bug existed in versions 0.4.4 and earlier). Please upgrade.

We're pinned to 0.4.4; I'll get us upgraded, thanks for the investigation!

Ladsgroup assigned this task to awight.
Ladsgroup moved this task from Parked to Completed on the Machine-Learning-Team (Active Tasks) board.
Ladsgroup subscribed.

This doesn't error out anymore.