Page MenuHomePhabricator

On labs Siteinfo is caching time sensitive stuff
Closed, ResolvedPublic

Description

No problems running locally but when on labs using getcurrenttime() is causing issues. getcurrenttime() always seems to be returning the same time, deleting everything in apicache makes the time update once but then it goes back to repeating so it's probably a caching issue.

Code:

import sys, time, pywikibot

def main(*args):
    counter = 0
    wiki_site = pywikibot.Site()
    while counter < 10:
        counter += 1
        print(wiki_site.getcurrenttime().isoformat())
        sys.stdout.flush()
        time.sleep(6)

if __name__ == "__main__":
    main()

Output:

2015-03-28T04:58:27Z
2015-03-28T04:58:33Z
2015-03-28T04:58:33Z
2015-03-28T04:58:33Z
2015-03-28T04:58:33Z
2015-03-28T04:58:33Z
2015-03-28T04:58:33Z
2015-03-28T04:58:33Z
2015-03-28T04:58:33Z
2015-03-28T04:58:33Z

Note that for some reason the first output time is different than the rest. Using siteinfo.get gives a consistent time which updates every time the program is run, but not throughout the program's runtime.

Code:

import sys, time, pywikibot

def main(*args):
    pywikibot.handle_args(args)
    counter = 0
    wiki_site = pywikibot.Site()
    while counter < 10:
        counter += 1
        print(wiki_site.siteinfo.get('time', get_default=False, cache=False, expiry=0))
        sys.stdout.flush()
        time.sleep(6)

if __name__ == "__main__":
    main()

The code above works if you make _is_expired from line 1220 of site.py always return true so the issue may be related to it not correctly identifying what is and isn't considered expired, not sure why there is a difference between labs and local.

Event Timeline

PhantomTech raised the priority of this task from to High.
PhantomTech updated the task description. (Show Details)
PhantomTech added a project: Pywikibot.
PhantomTech subscribed.
Restricted Application added subscribers: Aklapper, Unknown Object (MLST). · View Herald TranscriptMar 28 2015, 2:21 AM
PhantomTech raised the priority of this task from High to Unbreak Now!.Mar 28 2015, 5:05 AM
PhantomTech updated the task description. (Show Details)
PhantomTech set Security to None.
XZise subscribed.

Maybe the version on Cloud-Services is outdated? Is it possible to look in the code on labs and check if site.py does contain the same code as the current version on git? I'm not sure but I think the _is_expired method was at some point checking in the wrong direction (or something like that).

Okay 75e5834d did fix something along the lines, so please check that first. I'll lower the priority by the way because defining it as a bug author doesn't make sense because the bug author always wants it to get fixed.

XZise renamed this task from Time issue to On labs Siteinfo is caching time sensitive stuff.Mar 28 2015, 9:40 AM
XZise claimed this task.
XZise lowered the priority of this task from Unbreak Now! to Medium.

Do you delete the apicache and restart the script or do you delete it while the script runs? You could also (unlike other want to tell you) disable the apicache by setting API_config_expiry = 0 in your user-config.py. Okay in 75e5834d the change wasn't anywhere near 1220. So I guess the line number was a guess? What is interesting that it only fails the second time. Also does 75e5834d probably not apply to this case but I was certain at some point the cache was backwards … hugh.

With a clear apicache, the following requests are made:

  1. 2015-03-28 11:33:26 threadedhttp.py, 215 in request: DEBUG ('https://en.wikipedia.org/w/api.php', 'POST', 'maxlag=5&format=json&meta=siteinfo%7Cuserinfo&action=query&siprop=namespaces%7Cnamespacealiases%7Cgeneral&uiprop=blockinfo%7Chasmsg', {'connection': 'keep-alive', 'Content-Type': 'application/x-www-form-urlencoded', 'user-agent': u'test1 (wikipedia:en; User:Valhallasw) Pywikibot/2.0b2 (g4338) httplib2/0.8 Python/2.7.6.final.0'}, 5, None)
  2. 2015-03-28 11:33:26 threadedhttp.py, 215 in request: DEBUG ('https://en.wikipedia.org/w/api.php', 'POST', 'action=query&meta=userinfo&maxlag=5&uiprop=blockinfo%7Cgroups%7Chasmsg%7Crights&format=json', {'cookie': 'GeoIP=::::v4', 'connection': 'keep-alive', 'Content-Type': 'application/x-www-form-urlencoded', 'user-agent': u'test1 (wikipedia:en; User:Valhallasw) Pywikibot/2.0b2 (g4338) httplib2/0.8 Python/2.7.6.final.0'}, 5, None)
  3. 2015-03-28 11:33:26 threadedhttp.py, 215 in request: DEBUG ('https://en.wikipedia.org/w/api.php', 'POST', 'action=query&meta=userinfo&maxlag=5&uiprop=blockinfo%7Cgroups%7Chasmsg%7Crights&format=json', {'cookie': 'GeoIP=::::v4', 'connection': 'keep-alive', 'Content-Type': 'application/x-www-form-urlencoded', 'user-agent': u'test1 (wikipedia:en; User:Valhallasw) Pywikibot/2.0b2 (g4338) httplib2/0.8 Python/2.7.6.final.0'}, 5, None)
  4. 2015-03-28 11:33:26 threadedhttp.py, 215 in request: DEBUG ('https://en.wikipedia.org/w/api.php', 'POST', 'maxlag=5&format=json&meta=siteinfo%7Cuserinfo&action=query&siprop=general&uiprop=blockinfo%7Chasmsg', {'cookie': 'GeoIP=::::v4', 'connection': 'keep-alive', 'Content-Type': 'application/x-www-form-urlencoded', 'user-agent': u'test1 (wikipedia:en; User:Valhallasw) Pywikibot/2.0b2 (g4338) httplib2/0.8 Python/2.7.6.final.0'}, 5, None)

Only the first of those returns the server time -- I'm not sure why there's several requests for the blockinfo, though. No further requests are made (and I also cannot reproduce the 'first timestamp is different' issue). However, the following does appear in the log:

2015-03-28 11:33:26             api.py,  545 in   _handle_warnings: WARNING  API warning (query): Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query.
2015-03-28 11:33:32             api.py,  545 in   _handle_warnings: WARNING  API warning (query): Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query.
2015-03-28 11:33:39             api.py,  545 in   _handle_warnings: WARNING  API warning (query): Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query.

which suggests the response of siteinfo is parsed every time. This suggests the issue is in the api cache. A simple confirmation if this is clearing apicache/* during the run, which results in a new date being shown.

After puzzling with @XZise on IRC, this is what we think happens:

  • Initially, the Site object has no information loaded.
  • site.version() calls the API with an expiry of 1 day. This loads the siteinfo from the apicache, including the site time
  • site.getcurrenttime() is handled by Siteinfo's internal cache, which has a bug:
12:40 < xzise> return cache_date + expire >= datetime.datetime.utcnow()
12:40 < xzise> that seems wrong to me

(this should be cache_date + expire < datetime.datetime.utcnow())
and thus returns the (max 1 day old) Siteinfo version.

I'm /not/ quite sure why the version request isn't handled by the Siteinfo internal cache, though.

I'm /not/ quite sure why the version request isn't handled by the Siteinfo internal cache, though.

@XZise pointed out to me this is because the comparison is the wrong way around -- non-expired items (such as the site version) are then handled as if they /were/ expired. The same patch fixes both issues.

Change 200300 had a related patch set uploaded (by XZise):
[FIX] Siteinfo: Correctly compare cache time

https://gerrit.wikimedia.org/r/200300

Change 200300 merged by Merlijn van Deen:
[FIX] Siteinfo: Correctly compare cache time

https://gerrit.wikimedia.org/r/200300

Sorry for blaming labs when I the comparison was the wrong way around the hole time already :/