Page MenuHomePhabricator

[BLOCKER] Inconsistency in the ParamInfo API cache leads to assertion error
Closed, ResolvedPublic

Description

Adding a start parameter other than ... -start:\! ... seems to be failing when I looked to recontinue a job that I had to stop.

tools.wikisource-bot@tools-bastion-01:~$ python /shared/pywikipedia/core/scripts/touch.py -lang:en -family:wikisource -namespace:104 -start:"Page:Encyclopedia of Virginia Biography volume 2.djvu/232" -log:enWS_104purge -pt:0 -user:billinghurst
Traceback (most recent call last):

File "/shared/pywikipedia/core/scripts/touch.py", line 106, in <module>
  main()
File "/shared/pywikipedia/core/scripts/touch.py", line 90, in main
  elif not genFactory.handleArg(arg) and arg.startswith("-"):
File "/shared/pywikipedia/core/pywikibot/pagegenerators.py", line 674, in handleArg
  site=self.site)
File "/shared/pywikipedia/core/pywikibot/pagegenerators.py", line 849, in AllpagesPageGenerator
  content=content)
File "/shared/pywikipedia/core/pywikibot/tools/__init__.py", line 1248, in wrapper
  return obj(*__args, **__kw)
File "/shared/pywikipedia/core/pywikibot/site.py", line 3668, in allpages
  g_content=content)
File "/shared/pywikipedia/core/pywikibot/site.py", line 1718, in _generator
  gen = gen_class(type_arg, **req_args)
File "/shared/pywikipedia/core/pywikibot/data/api.py", line 2783, in __init__
  QueryGenerator.__init__(self, **kwargs)
File "/shared/pywikipedia/core/pywikibot/data/api.py", line 2434, in __init__
  set(self.modules) & self.site._paraminfo.query_modules_with_limits
File "/shared/pywikipedia/core/pywikibot/data/api.py", line 1033, in query_modules_with_limits
  self.fetch(self.submodules('query', True))
File "/shared/pywikipedia/core/pywikibot/data/api.py", line 948, in submodules
  self.fetch([name])
File "/shared/pywikipedia/core/pywikibot/data/api.py", line 562, in fetch
  self._fetch(modules)
File "/shared/pywikipedia/core/pywikibot/data/api.py", line 674, in _fetch
  for mod in normalized_result.values())
File "/shared/pywikipedia/core/pywikibot/data/api.py", line 723, in _generate_submodules
  self._add_submodules(module, submodules)
File "/shared/pywikipedia/core/pywikibot/data/api.py", line 229, in _add_submodules
  assert modules == self._modules[name]

AssertionError
<type 'exceptions.AssertionError'>
CRITICAL: Closing network session.

@Xqt says looks to be introduced at https://gerrit.wikimedia.org/r/#/c/231594/

Event Timeline

Billinghurst raised the priority of this task from to Needs Triage.
Billinghurst updated the task description. (Show Details)
Billinghurst added a project: Pywikibot.
Billinghurst added subscribers: Billinghurst, Xqt.
Xqt triaged this task as Unbreak Now! priority.Aug 15 2015, 6:32 AM

got this traceback by a simple statement:

>>> import pwb, pywikibot as py
>>> s = py.Site()
>>> p = py.Page(s, 'user:xqt/Test')
>>> t = p.text
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    t = p.text
  File "pywikibot\page.py", line 491, in text
    self._text = self.get(get_redirect=True)
  File "pywikibot\tools\__init__.py", line 1248, in wrapper
    return obj(*__args, **__kw)
  File "pywikibot\page.py", line 358, in get
    self._getInternals(sysop)
  File "pywikibot\page.py", line 388, in _getInternals
    self.site.loadrevisions(self, getText=True, sysop=sysop)
  File "pywikibot\site.py", line 3541, in loadrevisions
    step=step, total=total, **rvargs)
  File "pywikibot\site.py", line 1718, in _generator
    gen = gen_class(type_arg, **req_args)
  File "pywikibot\data\api.py", line 2856, in __init__
    QueryGenerator.__init__(self, **kwargs)
  File "pywikibot\data\api.py", line 2436, in __init__
    set(self.modules) & self.site._paraminfo.query_modules_with_limits
  File "pywikibot\data\api.py", line 1035, in query_modules_with_limits
    self.fetch(self.submodules('query', True))
  File "pywikibot\data\api.py", line 950, in submodules
    self.fetch([name])
  File "pywikibot\data\api.py", line 564, in fetch
    self._fetch(modules)
  File "pywikibot\data\api.py", line 676, in _fetch
    for mod in normalized_result.values())
  File "pywikibot\data\api.py", line 725, in _generate_submodules
    self._add_submodules(module, submodules)
  File "pywikibot\data\api.py", line 229, in _add_submodules
    assert modules == self._modules[name]
AssertionError

Change 231751 had a related patch set uploaded (by Xqt):
Revert "[FIX] ParamInfo: Correctly initialize ParamInfo"

https://gerrit.wikimedia.org/r/231751

Xqt renamed this task from Failure of -start parameter with a start page defined to [BLOCKER] API query fails.Aug 15 2015, 6:37 AM
Xqt claimed this task.
Xqt set Security to None.
Xqt added subscribers: jayvdb, XZise.

Change 231751 merged by jenkins-bot:
Revert "[FIX] ParamInfo: Correctly initialize ParamInfo"

https://gerrit.wikimedia.org/r/231751

In addition deleting apicache contents is necessary

Hm, reverting my revert doesn't re-introduce that bug. Must be an other issue which changed the api response because deleting the cache solves it.

Well I haven't tested it yet but I don't know how my patch could be the reason for that bug because the assertion is about the fact that it tries to add a submodule (in this case most likely query's submodules) which are differ from already cached. And with your info that reseting the cache is necessary this shows to me that you had a cache for query's submodules which differs from the current condition. And my patch might look like the reason for the bug because it might query the submodules differently so that it doesn't use the cache but instead query the info anew which is different from the cached and then causes the issue. But to be honest I don't know where my patch is querying the info differently which could cause it to not use the cached value.

XZise removed Xqt as the assignee of this task.
XZise removed a project: Patch-For-Review.
XZise added a subscriber: JAnD.

Okay after today a third one reported exactly the same issue, I was able to work with them to actually check what is happening. The script in P1892 is basically replicating what is happening but actually patching some parts to get the vital information. When executing the script P1893 shows that it first loads the query modules using paraminfo as that is using querymodules and @jayvdb is using that as a hack to get to the query modules without actually querying the query module. That information is from July 25th. Then it queries the query module to get its submodules (like in the traceback of the opening post) and it assumes that the query modules are only there when the query module was actually fetched (which was until now not the case) so it fetched them but that information is from Aug 15th. And that version is now supporting one additional submodule (flowinfo). So the only bit left is to verify that the submodule was actually added somewhere in that range.

My patches (424eaa35 and 8f2fba2b) might have highlighted this issue as they changed the order in which the requests are done and they were done in the time range mentioned. As all three cases were on toollabs maybe @Ladsgroup knows when it's updating the nightly (which is afaik what is used then). In the last case the bot is executed at 05:00, 13:00 and 21:00 UTC so when it builds the version somewhere between 21:00 UTC and 05:00 UTC it may be really 424eaa35 as it was merged Fri Aug 14 14:42:05 2015 +0000 and the first run on Saturday was failing.

In the end the problem is simply that our API cache may just contain inconsistent information and paranoid as I am I added asserts which are detecting that and thus fail. There are two proper fixes for this:

  • Ignore that assertion and hope there is no inconsistency
  • If it detects an inconsistency it invalidates all caches for that site with action=paraminfo and re-initialize it

Anyway I opened it again because reverting my fix is only hiding this issue and not actually solving anything. For now cleaning the cache works. In theory it's possible to write something for the maintenance script to just clean those. For example to print the paraminfo calls to the Commons (+the cache date/time):

python pwb.py scripts/maintenance/cache.py -c "print(entry._uniquedescriptionstr() + '\n{0}\n'.format(entry._cachetime)) if 'Site(\"commons\", \"commons\")' in entry._uniquedescriptionstr() and 'paraminfo' in entry._uniquedescriptionstr() else None"
XZise renamed this task from [BLOCKER] API query fails to [BLOCKER] Inconsistency in the ParamInfo API cache leads to assertion error.Aug 18 2015, 12:31 PM
XZise removed a subscriber: KTC.

If I understand correctly, a quick fix is to ignore cache entries for action=paraminfo with modules containing paraminfo if that cache entry is older than a certain date.

The issue is that there are two ways to get the query modules and they could get out of sync. So I'm not sure how sensible a certain date is. Do you mean relative to “now” or just in this specific case?

Do you mean relative to “now” or just in this specific case?

Can anybody answer that last question? (And is this still "Unbreak now" priority?)

I meant invalidate cache entries which have this specific problem - we know the date range where invalid cache entries could have been created. It isnt too often we have a problem like this, and it is a temporary problem as the cache entries will become invalidated soon anyway. I'd prefer the assertion is kept, and kept simple. Adding the ability to ignore cache entries will help us next time there is a cache problem.

While this bug is incredibly annoying for anyone who faces it, I suspect all people effected have already worked around it, by deleting their apicache. i.e. a fix wont help anyone.
Perhaps we could delete everyones apicache on tool labs..?

Change 234434 had a related patch set uploaded (by John Vandenberg):
Report unexpected API error contents

https://gerrit.wikimedia.org/r/234434

Change 234434 had a related patch set uploaded (by John Vandenberg):
Report unexpected API error contents

Patch has been merged.
Is more to do here, or can this task be closed as resolved?

jayvdb lowered the priority of this task from Unbreak Now! to High.Sep 2 2015, 12:13 PM
jayvdb removed a project: Patch-For-Review.

Change 234434 had a related patch set uploaded (by John Vandenberg):
Report unexpected API error contents

Patch has been merged.
Is more to do here, or can this task be closed as resolved?

I used the wrong bug number on that patch.
The problem isnt resolved, but it becomes less relevant as the days go by, and in ~10 days the problem disappears.

This is occurring again due to another query module being added: contenttranslationsuggestions

Change 236246 had a related patch set uploaded (by John Vandenberg):
Deprecate ParamInfo.query_modules_with_limits

https://gerrit.wikimedia.org/r/236246

Change 236246 merged by jenkins-bot:
Deprecate ParamInfo.query_modules_with_limits

https://gerrit.wikimedia.org/r/236246

Change 243060 had a related patch set uploaded (by John Vandenberg):
Deprecate ParamInfo.query_modules_with_limits

https://gerrit.wikimedia.org/r/243060

Change 243060 merged by jenkins-bot:
Deprecate ParamInfo.query_modules_with_limits

https://gerrit.wikimedia.org/r/243060

Xqt claimed this task.

Cannot reproduce this bug anymore; probably it is solved already. I close this task therefore. Please reopen if this issue still exists.