Page MenuHomePhabricator

EventStreams doesnt find any messages anymore
Closed, ResolvedPublic

Description

Since yesterday evening no EventStreams messages are found anymore

>>> import pwb, pywikibot
>>> from pywikibot.comms.eventstreams import EventStreams
>>> stream = EventStreams(stream='recentchange')
>>> change = next(iter(stream))
WARNING: Empty message found.
WARNING: Empty message found.
WARNING: Empty message found.
WARNING: Empty message found.
WARNING: Empty message found.
WARNING: Empty message found.

Event Timeline

Xqt triaged this task as High priority.Jan 11 2018, 12:45 PM

The Event.__dict__ gives always:

{'data': '', 'event': 'message', 'id': None, 'retry': None}

Here the sseclient dict itself:

{'buf': u'\x00\ufffd\ufffdm\x08\x01E\ufffd\x11\\K&=w\'\ufffd?\ufffd\ufffdP \ufffd\x7f\x7fUZO^\u0717\x06l\ufffdw\u06e0\ufffd@;\ufffds\ufffd\x0f\ufffd\x19\ufffdQ\u05f0);\ufffd\x00=vQ\x07\ufffd\x1d\ufffd\ufffd\ufffd|\u0201)I\ufffd\ufffdu)\ufffd\x0ep[\ufffd;\ufffd\ufffd\ufffd>\x1e\ufffd\ufffd\ufffd\ufffdCi\ufffdH\u07c70\'\ufffd\ufffdx\ufffd\ufffd\ufffdWg/\x16\ufffd:\ufffd\ufffd\ufffdYU\ufffd\ufffd.\x07*(\ufffd\u02bd\ufffd!T\ufffds\ufffd8-\ufffd\ufffdI\ufffdi\ufffd\ufffd\ufffdGA\x12\x0fjjp\ufffdT\ufffd\ufffd \ufffda\ufffdlL1\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffdY\ufffd\ufffduK3\ufffdh\ufffd*,\ufffd\x02\ufffd\ufffdZ\ufffd\ufffd\x11\ufffd\ufffd\ufffd\u0361~\ufffdU;\ufffdG\ufffd\ufffd*\ufffd=42,|)\ufffd\ufffd \x12\ufffdXWGe\r\ufffd\x0c\x83&\t\ufffd\ufffd\ufffd(c\ufffd\x04uiE\x04\x0fS\ufffd\ufffd}I\ufffdl\ufffd+N\u0643<]Q\'eT\x14\ufffd\ufffd\ufffd-I\ufffd\ufffdK \x1bUe\x1aA\ufffd9\ufffd\ufffd\ufffd\ufffdK\ufffd\x18\ufffd\ufffds\ufffda,U\r\x00\ufffdQZ\ufffdC.AEw,\ufffd \ufffd\ufffd\ufffd\x1a\ufffd\x0b*\x1f\x00_\ufffd\ufffd\\\ufffd\ufffdb\ufffd\ufffd\x12p*\ufffd\ufffd&\ufffd\ufffdRo\ufffd\u066b\ufffd\ufffdo\ufffd&\ufffdy\ufffd\ufffd\ufffd>\u0230vr\ufffd\ufffd\x030`\ufffdf\ufffd\ufffdr\x0e\ufffd\x13\ufffdK"\ufffd\ufffd9\ufffd\ufffdM\ufffd\ufffd\ufffd\ufffd\u038e\ufffd\ufffd \ufffd\ufffd\ufffd\u0323\ufffdG{{Y\ufffd\x0f\ufffd-&I\ufffd!\ufffd!\ufffd\x1d\x16\ufffd\ufffd\ufffdm\ufffdI\ufffd\ufffd\ufffd"\x1d\x06\x12\ufffd\ufffd\ufffd\x01D\ufffdb:t\ufffd_~s\x18\ufffd{\ufffd\ufffd\x0e8e\x1cs\ufffdR\x13=]\ufffdi&\ufffd\ufffdqf2S-6\ufffd\ufffd\ufffd\ufffd\ufffd\x12uh\ufffd\x11xcz\ufffd6>n\ufffd\ufffd\ufffd\x19\ufffd\ufffdcQA\ufffd\ufffd3\ufffdWH\ufffd\ufffd#\ufffd=\ufffdk9`\ufffdlY\ufffd\ufffdk\ufffd%s\ufffd&l\ufffdB{A&\x11u8\x1a\ufffd5m\ufffdqS\ufffd\ufffd\ufffd3/~\ufffd\ufffd\ufffd\ufffd\ufffd\x05@\ufffd\ufffdG\ufffd\x11\ufffdc\x7f\x04;\ufffd\ufffdh\n',
 'last_id': None,
 'requests_kwargs': {'headers': {'Accept': 'text/event-stream',
                                 'Cache-Control': 'no-cache'},
                     u'timeout': (6.05, 45)},
 'resp': <Response [200]>,
 'resp_file': <requests.packages.urllib3.response.HTTPResponse object at 0x02C7FB90>,
 'retry': 3000,
 'session': None,
 'url': u'https://stream.wikimedia.org/v2/stream/recentchange'}

Ah, I did deploy EventStreams yesterday for T171011. I don't know exactly what caused this change, but I think the event.data is now utf-8 encoded. I don't know why it wouldn't have been before but is now. All tests in JavaScript pass, I think I'll need to add a python based test to the EventStreams test suite.

To fix on your side, you'd either: A. run in python3, where everything is utf-8 anyway, or B. change this line to say element = json.loads(event.data.decode('utf-8').

What it is doing now might be more correct than what it was doing before. But, I also don't want to break users. @Xqt, what do you think I should do? Should we get a fix into pywikibot, or would you prefer if I rolled-back?

I can't reproduce this locally with python2:

$ PYWIKIBOT2_NO_USER_CONFIG=1 python pwb.py shell
No handlers could be found for logger "pywiki"
Welcome to the Pywikibot interactive shell!
>>> import pwb, pywikibot
>>> from pywikibot.comms.eventstreams import EventStreams
>>> stream = EventStreams(stream='recentchange')
>>> change = next(iter(stream))
WARNING: Empty message found.
>>> change
{u'comment': u'/* wbsetdescription-set:1|ace */ krueng di Ind\xf4n\xe8sia, #quickstatements', u'wiki': u'wikidatawiki', u'type': u'edit', u'server_name': u'www.wikidata.org', u'server_script_path': u'/w', u'namespace': 0, u'parsedcomment': u'\u200e<span dir="auto"><span class="autocomment">Changed Achinese description: </span></span> krueng di Ind\xf4n\xe8sia, #quickstatements', u'bot': False, u'server_url': u'https://www.wikidata.org', u'length': {u'new': 4144, u'old': 4152}, u'meta': {u'domain': u'www.wikidata.org', u'partition': 0, u'uri': u'https://www.wikidata.org/wiki/Q25165214', u'offset': 646056187, u'topic': u'eqiad.mediawiki.recentchange', u'request_id': u'e1682804-42c3-4c26-b4d2-bb5ffb7b7285', u'schema_uri': u'mediawiki/recentchange/1', u'dt': u'2018-01-11T15:39:50+00:00', u'id': u'a93f7917-f6e5-11e7-af40-b083fecf0f99'}, u'user': u'Rachmat04', u'timestamp': 1515685190, u'title': u'Q25165214', u'patrolled': True, u'id': 654280778, u'minor': False, u'revision': {u'new': 618889710, u'old': 599864657}}
>>> pywikibot.version.getversion()
u'[https] wikimedia-pywikibot-core.git (d217d05, g8927, 2018/01/11, 12:07:16, ok)'
>>> 
$ PYWIKIBOT2_NO_USER_CONFIG=1 python pwb.py version
No handlers could be found for logger "pywiki"
Pywikibot: [https] wikimedia-pywikibot-core.git (d217d05, g8927, 2018/01/11, 12:07:16, ok)
Release version: 3.0-dev
requests version: 2.9.1
  cacerts: /etc/ssl/certs/ca-certificates.crt
    certificate test: ok
Python: 2.7.12 (default, Nov 20 2017, 18:23:56) 
[GCC 5.4.0 20160609]
PYWIKIBOT2_DIR: Not set
PYWIKIBOT2_DIR_PWB: 
PYWIKIBOT2_NO_USER_CONFIG: 1
Config base dir: /home/zhuyifei1999/mw-dev/pywikibot-core

However, one of three eventstreams consumers of mine (running on toolforge python 2), got significant number of decoding failures recently. (Logs incoming...) Edit: Restart frequency (it automatically restarts if it sees no RC from commonswiki for a minute) looks normal: P6578

To fix on your side, you'd either: A. run in python3, where everything is utf-8 anyway, or B. change this line to say element = json.loads(event.data.decode('utf-8').

event.data in python 2 is an instance of unicode. unicode.decode(encoding) is (near-)equivalent to unicode.encode(sys.getdefaultencoding()).decode(encoding) which makes little sense and results in an error (since default encoding is ascii):

>>> import pwb, pywikibot
>>> from pywikibot.comms.eventstreams import EventStreams
>>> stream = EventStreams(stream='recentchange')
>>> change = next(iter(stream))
WARNING: Empty message found.
>>> event = next(stream.source)
>>> event.data
u'{"bot":false,"comment":"[\u0421\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u0435: \u043e\u0442\u043f\u0430\u0442\u0440\u0443\u043b\u0438\u0440\u043e\u0432\u0430\u043d\u0430]","id":null,"log_action":"approve","log_action_comment":"Stsents \u043e\u0442\u043f\u0430\u0442\u0440\u0443\u043b\u0438\u0440\u043e\u0432\u0430\u043b \u0432\u0435\u0440\u0441\u0438\u044e \u0441\u0442\u0440\u0430\u043d\u0438\u0446\u044b [[\u041a\u0430\u043c\u0435\u043d\u0441\u043a\u0430\u044f (\u0441\u0442\u0430\u043d\u0446\u0438\u044f)]]: [\u0421\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u0435: \u043e\u0442\u043f\u0430\u0442\u0440\u0443\u043b\u0438\u0440\u043e\u0432\u0430\u043d\u0430]","log_id":56743614,"log_params":[90157284,86964853,"20180108211855"],"log_type":"review","meta":{"domain":"ru.wikipedia.org","dt":"2018-01-11T16:19:40+00:00","id":"399fa6f4-f6eb-11e7-97eb-90b11c278bf7","request_id":"06d9f8fb-ee3f-4ba2-80ee-30d249778fb8","schema_uri":"mediawiki/recentchange/1","topic":"eqiad.mediawiki.recentchange","uri":"https://ru.wikipedia.org/wiki/%D0%9A%D0%B0%D0%BC%D0%B5%D0%BD%D1%81%D0%BA%D0%B0%D1%8F_(%D1%81%D1%82%D0%B0%D0%BD%D1%86%D0%B8%D1%8F)","partition":0,"offset":646112725},"namespace":0,"parsedcomment":"[\u0421\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u0435: \u043e\u0442\u043f\u0430\u0442\u0440\u0443\u043b\u0438\u0440\u043e\u0432\u0430\u043d\u0430]","server_name":"ru.wikipedia.org","server_script_path":"/w","server_url":"https://ru.wikipedia.org","timestamp":1515687580,"title":"\u041a\u0430\u043c\u0435\u043d\u0441\u043a\u0430\u044f (\u0441\u0442\u0430\u043d\u0446\u0438\u044f)","type":"log","user":"Stsents","wiki":"ruwiki"}'
>>> event.data.decode('utf-8')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 25-33: ordinal not in range(128)

event.data in python 2 is an instance of unicode

Hm, you are right.

I'm not sure what is going on then. I can't really reproduce this either, but I'm just testing in Python, not with pywikibot.

I have no glue what is failing there. Our pagegenerators_tests on Travis ci works fine for all python versions. On the other hand I have two scripts running with EventStreams and both where failing yesterday about 18.00 after a reestablished connection. The framework codebase was unchanged since two weeks or so. Digging deeper I found out that sseclient runs into a loop with EOFError and there is no response from sseclient. Whereas my developing system gets response from sseclient but empty data as noted above. Maybe the different behavior looks like a different requests library version. I’ll check it this weekend probably. My tests where on python 2.7 because I work on an async implementation of pwb’s EventStreams class with py3 currently. Now I went back to good old compat release to keep my scripts running :)

I have requests 2.7.0 for my production environment; I get no response from sseclient because next(self.resp_iterator) is empty there. For the development environment I have requests 2.13.0. and I get an empty message from sseclient. sseclient is 0.0.18 for production and 0.0.14 for the other System but both files are equal. Curious that requests 2.9.1 works.

Ah, I did deploy EventStreams yesterday for T171011. I don't know exactly what caused this change, but I think the event.data

I don't find a task T171011

To fix on your side, you'd either: A. run in python3, where everything is utf-8 anyway, or B. change this line to say element = json.loads(event.data.decode('utf-8').

data is empty and no json.load s processed

I found out that eventstreams/sseclient doesn't work with requests < 2.9 (for me)

Change 406579 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [bugfix] requests>=2.9 is required for sse client

https://gerrit.wikimedia.org/r/406579

sseclient>=0.0.18 is required for py2

I guess I have it. sseclient >= 0.0.18 and requests >= 2.9 should solve this problem.

The default requests version available on Toolforge is 2.2.1. Is it OK to require 2.9?

@Dalba: It is required to use EventStreams after the last change on server side. I checked every release of requests and sseclient to find the right combination. Anyway it the requests < 2.9 the framework can be used except of EventStreams which either finds empty messages or runs into a loop inside requests without any response. Maybe this should be solved upstreams then.

The default requests version available on Toolforge is 2.2.1. Is it OK to require 2.9?

I am wondering: in current requests-requirements.txt and requirements.txt the entries are
requests>=2.4.1,!=2.18.2
Why does the default version care then?

I think this patch doesn't care current installations. Currently requests>=2.4.1 is required I guess for socket_timeout. If an older release is installed, config2.py patches it and the pwb is still working. Now we require 2.9 which is needed for sseclient. If an older release is installed all parts are still working. With this pathc EventStreams which gives an error message to update the requests version. Without this patch no warning will be shown but EventStreams doesn't work anyway.

The default requests version available on Toolforge is 2.2.1. Is it OK to require 2.9?

Yes. As noted in T187650 the supported way to have a python package in Toolforge is to install it in a virtual environment.

Change 406579 merged by jenkins-bot:
[pywikibot/core@master] [bugfix] Update dependencies for eventstreams

https://gerrit.wikimedia.org/r/406579

238482n375 lowered the priority of this task from High to Lowest.
238482n375 moved this task from Next Up to In Code Review on the Analytics-Kanban board.
238482n375 edited subscribers, added: 238482n375; removed: Aklapper.

SG9tZVBoYWJyaWNhdG9yCk5vIG1lc3NhZ2VzLiBObyBub3RpZmljYXRpb25zLgoKICAgIFNlYXJjaAoKQ3JlYXRlIFRhc2sKTWFuaXBoZXN0ClQxOTcyODEKRml4IGZhaWxpbmcgd2VicmVxdWVzdCBob3VycyAodXBsb2FkIGFuZCB0ZXh0IDIwMTgtMDYtMTQtMTEpCk9wZW4sIE5lZWRzIFRyaWFnZVB1YmxpYwoKICAgIEVkaXQgVGFzawogICAgRWRpdCBSZWxhdGVkIFRhc2tzLi4uCiAgICBFZGl0IFJlbGF0ZWQgT2JqZWN0cy4uLgogICAgUHJvdGVjdCBhcyBzZWN1cml0eSBpc3N1ZQoKICAgIE11dGUgTm90aWZpY2F0aW9ucwogICAgQXdhcmQgVG9rZW4KICAgIEZsYWcgRm9yIExhdGVyCgpUYWdzCgogICAgQW5hbHl0aWNzLUthbmJhbiAoSW4gUHJvZ3Jlc3MpCgpTdWJzY3JpYmVycwpBa2xhcHBlciwgSkFsbGVtYW5kb3UKQXNzaWduZWQgVG8KSkFsbGVtYW5kb3UKQXV0aG9yZWQgQnkKSkFsbGVtYW5kb3UsIEZyaSwgSnVuIDE1CkRlc2NyaXB0aW9uCgpPb3ppZSBqb2JzIGhhdmUgYmVlbiBmYWlsaW5nIGF0IGxlYXN0IGEgZmV3IHRpbWVzIGVhY2guIE1vcmUgaW52ZXN0aWdhdGlvbiBuZWVkZWQuCkpBbGxlbWFuZG91IGNyZWF0ZWQgdGhpcyB0YXNrLkZyaSwgSnVuIDE1LCA3OjIxIEFNCkhlcmFsZCBhZGRlZCBhIHN1YnNjcmliZXI6IEFrbGFwcGVyLiC3IFZpZXcgSGVyYWxkIFRyYW5zY3JpcHRGcmksIEp1biAxNSwgNzoyMSBBTQpKQWxsZW1hbmRvdSBjbGFpbWVkIHRoaXMgdGFzay5GcmksIEp1biAxNSwgNzoyMiBBTQpKQWxsZW1hbmRvdSB1cGRhdGVkIHRoZSB0YXNrIGRlc2NyaXB0aW9uLiAoU2hvdyBEZXRhaWxzKQpKQWxsZW1hbmRvdSBhZGRlZCBhIHByb2plY3Q6IEFuYWx5dGljcy1LYW5iYW4uCkpBbGxlbWFuZG91IG1vdmVkIHRoaXMgdGFzayBmcm9tIE5leHQgVXAgdG8gSW4gUHJvZ3Jlc3Mgb24gdGhlIEFuYWx5dGljcy1LYW5iYW4gYm9hcmQuCkNoYW5nZSBTdWJzY3JpYmVycwpDaGFuZ2UgUHJpb3JpdHkKQXNzaWduIC8gQ2xhaW0KTW92ZSBvbiBXb3JrYm9hcmQKQ2hhbmdlIFByb2plY3QgVGFncwpBbmFseXRpY3MtS2FuYmFuCtcKU2VjdXJpdHkK1wpXaWtpbWVkaWEtVkUtQ2FtcGFpZ25zIChTMi0yMDE4KQrXClNjYXAK1wpTY2FwIChTY2FwMy1BZG9wdGlvbi1QaGFzZTIpCtcKQWJ1c2VGaWx0ZXIK1wpEYXRhLXJlbGVhc2UK1wpIYXNodGFncwrXCkxhYnNEQi1BdWRpdG9yCtcKTGFkaWVzLVRoYXQtRk9TUy1NZWRpYVdpa2kK1wpMYW5ndWFnZS0yMDE4LUFwci1KdW5lCtcKTGFuZ3VhZ2UtMjAxOC1KYW4tTWFyCtcKSEhWTQrXCkhBV2VsY29tZQrXCkJvbGQKSXRhbGljcwpNb25vc3BhY2VkCkxpbmsKQnVsbGV0ZWQgTGlzdApOdW1iZXJlZCBMaXN0CkNvZGUgQmxvY2sKUXVvdGUKVGFibGUKVXBsb2FkIEZpbGUKTWVtZQpQcmV2aWV3CkhlbHAKRnVsbHNjcmVlbiBNb2RlClBpbiBGb3JtIE9uIFNjcmVlbgoyMzg0ODJuMzc1IGFkZGVkIHByb2plY3RzOiBTZWN1cml0eSwgV2lraW1lZGlhLVZFLUNhbXBhaWducyAoUzItMjAxOCksIFNjYXAgKFNjYXAzLUFkb3B0aW9uLVBoYXNlMiksIEFidXNlRmlsdGVyLCBEYXRhLXJlbGVhc2UsIEhhc2h0YWdzLCBMYWJzREItQXVkaXRvciwgTGFkaWVzLVRoYXQtRk9TUy1NZWRpYVdpa2ksIExhbmd1YWdlLTIwMTgtQXByLUp1bmUsIExhbmd1YWdlLTIwMTgtSmFuLU1hciwgSEhWTSwgSEFXZWxjb21lLlBSRVZJRVcKMjM4NDgybjM3NSBtb3ZlZCB0aGlzIHRhc2sgZnJvbSBJbiBQcm9ncmVzcyB0byBJbiBDb2RlIFJldmlldyBvbiB0aGUgQW5hbHl0aWNzLUthbmJhbiBib2FyZC4KMjM4NDgybjM3NSByZW1vdmVkIEpBbGxlbWFuZG91IGFzIHRoZSBhc3NpZ25lZSBvZiB0aGlzIHRhc2suCjIzODQ4Mm4zNzUgdHJpYWdlZCB0aGlzIHRhc2sgYXMgTG93ZXN0IHByaW9yaXR5LgoyMzg0ODJuMzc1IHJlbW92ZWQgc3Vic2NyaWJlcnM6IEFrbGFwcGVyLCBKQWxsZW1hbmRvdS4KQ29udGVudCBsaWNlbnNlZCB1bmRlciBDcmVhdGl2ZSBDb21tb25zIEF0dHJpYnV0aW9uLVNoYXJlQWxpa2UgMy4wIChDQy1CWS1TQSkgdW5sZXNzIG90aGVyd2lzZSBub3RlZDsgY29kZSBsaWNlbnNlZCB1bmRlciBHTlUgR2VuZXJhbCBQdWJsaWMgTGljZW5zZSAoR1BMKSBvciBvdGhlciBvcGVuIHNvdXJjZSBsaWNlbnNlcy4gQnkgdXNpbmcgdGhpcyBzaXRlLCB5b3UgYWdyZWUgdG8gdGhlIFRlcm1zIG9mIFVzZSwgUHJpdmFjeSBQb2xpY3ksIGFuZCBDb2RlIG9mIENvbmR1Y3QuILcgV2lraW1lZGlhIEZvdW5kYXRpb24gtyBQcml2YWN5IFBvbGljeSC3IENvZGUgb2YgQ29uZHVjdCC3IFRlcm1zIG9mIFVzZSC3IERpc2NsYWltZXIgtyBDQy1CWS1TQSC3IEdQTApZb3VyIGJyb3dzZXIgdGltZXpvbmUgc2V0dGluZyBkaWZmZXJzIGZyb20gdGhlIHRpbWV6b25lIHNldHRpbmcgaW4geW91ciBwcm9maWxlLCBjbGljayB0byByZWNvbmNpbGUu