Page MenuHomePhabricator

Upgrade eventlogging to Python 3
Closed, ResolvedPublic13 Estimated Story Points

Description

Eventlogging needs to be ported to Python3.

Launch scripts

Many of the bin scripts rely on reload. There is a python 3 equivalent, but the code is doing something weird and no comment was provided, so let's verify whether we need this workaround at all:

import sys
reload(sys)  # noqa

From https://github.com/wikimedia/eventlogging/commit/375ae076ef70d87d7c3b7b9676d0433cd5c139c4 it seems that it was done to force utf-8, we can probably skip it.

Mysql deps

In T212396#4968773 it is stated that the current version of eventlogging supports only the mysql module < 1.4.0, meanwhile for python3 we need a more up to date version.

requirements.txt

Does it need to be changed?

python-dateutil>=1.5
jsonschema>=0.7
confluent-kafka>=0.9.1.2
kafka-python>=1.3.0
mock==1.0.0
mysqlclient
pygments>=1.5
psutil>=2.1.1
PyYAML>=3.10
pyzmq>=2.1
sqlalchemy>=0.7
statsd>=3.0
tornado>=4.0
sprockets.mixins.statsd>=1.3.1
ua_parser>=0.8.0

Docker

Is there anything to do with the Docker directory? Possibly move it to stretch/buster?

Python target version

eventlog1002 is currently running Stretch, so 3.5.3. Buster runs 3.7, but I don't think that we should move to it when upgrading to Python 3 since I'd like to keep the rollback strategy as simple as possible.

Event Timeline

fdans moved this task from Incoming to Operational Excellence on the Analytics board.

@Ottomata do you remember why the reload(sys) stuff was done in the first place? I tried to dig a bit into git blame/log but didn't find anything useful.. If you don't remember I'll dig into the code :)

Edit: 2to3 suggests the following, that may work just fine:

-reload(sys)  # noqa
+import importlib
+importlib.reload(sys)  # noqa

No idea why:
https://github.com/wikimedia/eventlogging/commit/375ae076ef70d87d7c3b7b9676d0433cd5c139c4

It looks like it has something to do with python 2 utf-8, which I'd assume isn't necessary in python 3. I'd try removing the reload stuff altogether first.

No idea why:
https://github.com/wikimedia/eventlogging/commit/375ae076ef70d87d7c3b7b9676d0433cd5c139c4

It looks like it has something to do with python 2 utf-8, which I'd assume isn't necessary in python 3. I'd try removing the reload stuff altogether first.

I think this is a good guess. Here's a random forum post from 2007, recommending exactly the code added in our 2014 patch: https://bytes.com/topic/python/answers/701188-reload-sys

Change 540871 had a related patch set uploaded (by Elukey; owner: Elukey):
[eventlogging@master] [WIP] Move codebase to Python3

https://gerrit.wikimedia.org/r/540871

Luca, if you have trouble migrating eventlogging/service.py stuff, we can probably just remove it. We've removed all eventlogging-service-eventbus instannces in production and in mw vagrant too.

Luca, if you have trouble migrating eventlogging/service.py stuff, we can probably just remove it. We've removed all eventlogging-service-eventbus instannces in production and in mw vagrant too.

All the service.py tests are failing so if the code can be deleted I am in favor of it! I can send a code review to kill that code if it is ok..

Change 541264 had a related patch set uploaded (by Elukey; owner: Elukey):
[eventlogging@master] Remove the Eventlog service codebase

https://gerrit.wikimedia.org/r/541264

Change 541264 merged by Elukey:
[eventlogging@master] Remove the Eventlog service codebase

https://gerrit.wikimedia.org/r/541264

Change 542333 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] eventlogging::dependencies: add python3 dependencies

https://gerrit.wikimedia.org/r/542333

Change 542333 merged by Elukey:
[operations/puppet@production] eventlogging::dependencies: add python3 dependencies

https://gerrit.wikimedia.org/r/542333

Change 542401 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] eventlogging::dependencies: remove python3-pykafka dependency

https://gerrit.wikimedia.org/r/542401

Change 542401 merged by Elukey:
[operations/puppet@production] eventlogging::dependencies: remove python3-pykafka dependency

https://gerrit.wikimedia.org/r/542401

Deployed in beta, no errors from python3 but kafka-python seems not doing its work. All the kafka consumers are stuck, this is the stacktrace of all-events (via py-bt in gdb):

(gdb) py-bt
Traceback (most recent call first):
  <built-in method poll of select.epoll object at remote 0x7f639a9ef9f0>
  File "/usr/lib/python3.5/selectors.py", line 445, in select
    fd_event_list = self._epoll.poll(timeout, max_ev)
  File "/usr/lib/python3/dist-packages/kafka/client_async.py", line 630, in _poll
    ready = self._selector.select(timeout)
  File "/usr/lib/python3/dist-packages/kafka/client_async.py", line 598, in poll
    self._poll(timeout / 1000)
  File "/usr/lib/python3/dist-packages/kafka/consumer/group.py", line 692, in _poll_once
    self._client.poll(timeout_ms=timeout_ms)
  File "/usr/lib/python3/dist-packages/kafka/consumer/group.py", line 645, in poll
    records = self._poll_once(remaining, max_records, update_offsets=update_offsets)
  File "/usr/lib/python3/dist-packages/kafka/consumer/group.py", line 1106, in _message_generator_v2
    record_map = self.poll(timeout_ms=timeout_ms, update_offsets=False)
  <built-in method next of module object at remote 0x7f639fadd5e8>
  File "/usr/lib/python3/dist-packages/kafka/consumer/group.py", line 1189, in next_v2
    return next(self._iterator)
  File "/usr/lib/python3/dist-packages/kafka/consumer/group.py", line 1181, in __next__
    return self.next_v2()
  File "/srv/deployment/eventlogging/analytics/eventlogging/handlers.py", line 884, in <genexpr>
    return stream((message.value for message in kafka_consumer), raw)
  File "/srv/deployment/eventlogging/analytics/eventlogging/streams.py", line 94, in <genexpr>
    return (line.decode('utf-8', 'replace') for line in stream)
  File "/srv/deployment/eventlogging/analytics/eventlogging/streams.py", line 101, in <genexpr>
    return (json.loads(dgram) for dgram in iter_unicode(stream))
  File "/srv/deployment/eventlogging/analytics/eventlogging/streams.py", line 108, in <genexpr>
    return (Event(d) for d in iter_json(stream))
  File "/srv/deployment/eventlogging/analytics/eventlogging/factory.py", line 140, in drive
    for event in reader:
  File "/srv/deployment/eventlogging/analytics/bin/eventlogging-consumer", line 61, in <module>
    eventlogging.drive(args.input, args.output)

Tried to switch the kafka consumer to the confluent one, but ended up in a similar situation:

Waiting for the GIL
<built-in method poll of cimpl.Consumer object at remote 0x7f97b7472a68>
File "/srv/deployment/eventlogging/analytics/eventlogging/handlers.py", line 995, in consume
  message = consumer.poll(timeout=timeout)
File "/srv/deployment/eventlogging/analytics/eventlogging/streams.py", line 94, in <genexpr>
  return (line.decode('utf-8', 'replace') for line in stream)
File "/srv/deployment/eventlogging/analytics/eventlogging/streams.py", line 101, in <genexpr>
  return (json.loads(dgram) for dgram in iter_unicode(stream))
File "/srv/deployment/eventlogging/analytics/eventlogging/streams.py", line 108, in <genexpr>
  return (Event(d) for d in iter_json(stream))
File "/srv/deployment/eventlogging/analytics/eventlogging/factory.py", line 141, in drive
  for event in reader:
File "/srv/deployment/eventlogging/analytics/bin/eventlogging-consumer", line 61, in <module>
  eventlogging.drive(args.input, args.output)

PEBCAK of the day: I picked the wrong consumer, since this pulls events from valid mixed, that does't get populated if the processors are not working.. So what are the processors doing?

Oct 11 14:36:28 deployment-eventlog05 eventlogging-processor@client-side-01[31316]: 2019-10-11 14:36:28,422 [31316] (MainThread) root [WARNING] Force a flush of the produce buffer before leaving the kafka-confluent writer.
Oct 11 14:36:28 deployment-eventlog05 eventlogging-processor@client-side-01[31316]: Traceback (most recent call last):
Oct 11 14:36:28 deployment-eventlog05 eventlogging-processor@client-side-01[31316]:   File "/srv/deployment/eventlogging/analytics/bin/eventlogging-processor", line 157, in <module>
Oct 11 14:36:28 deployment-eventlog05 eventlogging-processor@client-side-01[31316]:     w.send(event)
Oct 11 14:36:28 deployment-eventlog05 eventlogging-processor@client-side-01[31316]:   File "/srv/deployment/eventlogging/analytics/eventlogging/handlers.py", line 463, in kafka_confluent_writer
Oct 11 14:36:28 deployment-eventlog05 eventlogging-processor@client-side-01[31316]:     message_topic, message_value, message_key)
Oct 11 14:36:28 deployment-eventlog05 eventlogging-processor@client-side-01[31316]: TypeError: argument 1 must be str, not bytes

Fixed some little issues here and there and updated the code review. Re-deployed in beta, the service seems now working fine!

Change 540871 merged by Elukey:
[eventlogging@master] Move codebase to Python3

https://gerrit.wikimedia.org/r/540871

Tried to deploy in prod, but had to rollback due to the following exception logged in the processors (causing them to fail and rebalance etc..):

Oct 17 10:22:39 eventlog1002 eventlogging-processor@client-side-01[18540]: 2019-10-17 10:22:39,049 [18540] (MainThread) root [ERROR] Unable to create EventError object: b'{"dt": "2019-10-17T10:17:57Z", "event": {"deepcategory": false, "filetype": false, "hastemplate": false, "inlanguage": false, "intitle": false, "not": false, "or": false, "phrase": false, "plain": false, "sort": false, "subpageof": false}, "ip": "XX.XX.XX.XX", "recvFrom": "cp3041.esams.wmnet", "revision": 18227136, "schema": "AdvancedSearchRequest", "seqId": 286400128, "userAgent": {"browser_family": "Chrome", "browser_major": "77", "browser_minor": "0", "device_family": "Other", "is_bot": false, "is_mediawiki": false, "os_family": "Windows", "os_major": "10", "os_minor": null, "wmf_app_version": "-"}, "uuid": "XXXXXXXX", "webHost": "es.wikipedia.org", "wiki": "eswiki"}' is not JSON serializable
Oct 17 10:22:39 eventlog1002 eventlogging-processor@client-side-01[18540]: Traceback (most recent call last):
Oct 17 10:22:39 eventlog1002 eventlogging-processor@client-side-01[18540]:   File "/srv/deployment/eventlogging/analytics/bin/eventlogging-processor", line 157, in <module>
Oct 17 10:22:39 eventlog1002 eventlogging-processor@client-side-01[18540]:     w.send(event)
Oct 17 10:22:39 eventlog1002 eventlogging-processor@client-side-01[18540]: StopIteration

Change 543832 had a related patch set uploaded (by Elukey; owner: Elukey):
[eventlogging@master] handlers.py: avoid to encode() a kafka messag

https://gerrit.wikimedia.org/r/543832

Quick test:

elukey@eventlog1002:~$ python3
Python 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> a = '{"dt": "2019-10-17T10:17:57Z", "event": {"deepcategory": false, "filetype": false, "hastemplate": false, "inlanguage": false, "intitle": false, "not": false, "or": false, "phrase": false, "plain": false, "sort": false, "subpageof": false}, "ip": "XX.XX.XX.XX", "recvFrom": "cp3041.esams.wmnet", "revision": 18227136, "schema": "AdvancedSearchRequest", "seqId": 286400128, "userAgent": {"browser_family": "Chrome", "browser_major": "77", "browser_minor": "0", "device_family": "Other", "is_bot": false, "is_mediawiki": false, "os_family": "Windows", "os_major": "10", "os_minor": null, "wmf_app_version": "-"}, "uuid": "XXXXXXXX", "webHost": "es.wikipedia.org", "wiki": "eswiki"}'
>>> json.dumps(a)
'"{\\"dt\\": \\"2019-10-17T10:17:57Z\\", \\"event\\": {\\"deepcategory\\": false, \\"filetype\\": false, \\"hastemplate\\": false, \\"inlanguage\\": false, \\"intitle\\": false, \\"not\\": false, \\"or\\": false, \\"phrase\\": false, \\"plain\\": false, \\"sort\\": false, \\"subpageof\\": false}, \\"ip\\": \\"XX.XX.XX.XX\\", \\"recvFrom\\": \\"cp3041.esams.wmnet\\", \\"revision\\": 18227136, \\"schema\\": \\"AdvancedSearchRequest\\", \\"seqId\\": 286400128, \\"userAgent\\": {\\"browser_family\\": \\"Chrome\\", \\"browser_major\\": \\"77\\", \\"browser_minor\\": \\"0\\", \\"device_family\\": \\"Other\\", \\"is_bot\\": false, \\"is_mediawiki\\": false, \\"os_family\\": \\"Windows\\", \\"os_major\\": \\"10\\", \\"os_minor\\": null, \\"wmf_app_version\\": \\"-\\"}, \\"uuid\\": \\"XXXXXXXX\\", \\"webHost\\": \\"es.wikipedia.org\\", \\"wiki\\": \\"eswiki\\"}"'

>>> a = b'{"dt": "2019-10-17T10:17:57Z", "event": {"deepcategory": false, "filetype": false, "hastemplate": false, "inlanguage": false, "intitle": false, "not": false, "or": false, "phrase": false, "plain": false, "sort": false, "subpageof": false}, "ip": "XX.XX.XX.XX", "recvFrom": "cp3041.esams.wmnet", "revision": 18227136, "schema": "AdvancedSearchRequest", "seqId": 286400128, "userAgent": {"browser_family": "Chrome", "browser_major": "77", "browser_minor": "0", "device_family": "Other", "is_bot": false, "is_mediawiki": false, "os_family": "Windows", "os_major": "10", "os_minor": null, "wmf_app_version": "-"}, "uuid": "XXXXXXXX", "webHost": "es.wikipedia.org", "wiki": "eswiki"}'
>>> json.dumps(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/json/__init__.py", line 230, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.5/json/encoder.py", line 198, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.5/json/encoder.py", line 256, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.5/json/encoder.py", line 179, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: b'{"dt": "2019-10-17T10:17:57Z", "event": {"deepcategory": false, "filetype": false, "hastemplate": false, "inlanguage": false, "intitle": false, "not": false, "or": false, "phrase": false, "plain": false, "sort": false, "subpageof": false}, "ip": "XX.XX.XX.XX", "recvFrom": "cp3041.esams.wmnet", "revision": 18227136, "schema": "AdvancedSearchRequest", "seqId": 286400128, "userAgent": {"browser_family": "Chrome", "browser_major": "77", "browser_minor": "0", "device_family": "Other", "is_bot": false, "is_mediawiki": false, "os_family": "Windows", "os_major": "10", "os_minor": null, "wmf_app_version": "-"}, "uuid": "XXXXXXXX", "webHost": "es.wikipedia.org", "wiki": "eswiki"}' is not JSON serializable

Change 543832 abandoned by Elukey:
handlers.py: avoid to encode() a kafka messag

https://gerrit.wikimedia.org/r/543832

Change 543838 had a related patch set uploaded (by Elukey; owner: Elukey):
[eventlogging@master] eventlogging-processor: avoid to encode() an event when validation fails

https://gerrit.wikimedia.org/r/543838

Change 543838 abandoned by Elukey:
eventlogging-processor: avoid to encode() an event when validation fails

https://gerrit.wikimedia.org/r/543838

Tried to deploy in prod, but had to rollback due to the following exception logged in the processors (causing them to fail and rebalance etc..):

Oct 17 10:22:39 eventlog1002 eventlogging-processor@client-side-01[18540]: 2019-10-17 10:22:39,049 [18540] (MainThread) root [ERROR] Unable to create EventError object: b'{"dt": "2019-10-17T10:17:57Z", "event": {"deepcategory": false, "filetype": false, "hastemplate": false, "inlanguage": false, "intitle": false, "not": false, "or": false, "phrase": false, "plain": false, "sort": false, "subpageof": false}, "ip": "XX.XX.XX.XX", "recvFrom": "cp3041.esams.wmnet", "revision": 18227136, "schema": "AdvancedSearchRequest", "seqId": 286400128, "userAgent": {"browser_family": "Chrome", "browser_major": "77", "browser_minor": "0", "device_family": "Other", "is_bot": false, "is_mediawiki": false, "os_family": "Windows", "os_major": "10", "os_minor": null, "wmf_app_version": "-"}, "uuid": "XXXXXXXX", "webHost": "es.wikipedia.org", "wiki": "eswiki"}' is not JSON serializable
Oct 17 10:22:39 eventlog1002 eventlogging-processor@client-side-01[18540]: Traceback (most recent call last):
Oct 17 10:22:39 eventlog1002 eventlogging-processor@client-side-01[18540]:   File "/srv/deployment/eventlogging/analytics/bin/eventlogging-processor", line 157, in <module>
Oct 17 10:22:39 eventlog1002 eventlogging-processor@client-side-01[18540]:     w.send(event)
Oct 17 10:22:39 eventlog1002 eventlogging-processor@client-side-01[18540]: StopIteration

The interesting bit that comes before is

Oct 17 10:18:30 eventlog1002 eventlogging-processor@client-side-01[12541]: 2019-10-17 10:18:30,088 [12541] (MainThread) root [ERROR] Unable to validate: <Event ec577f7ca5ad5daa96cdcca23f5a6119 of schema ('AdvancedSearchRequest
', 18227136)> (Additional properties are not allowed ('sort' was unexpected))
Oct 17 10:18:30 eventlog1002 eventlogging-processor@client-side-01[12541]: 2019-10-17 10:18:30,089 [12541] (MainThread) root [WARNING] Force a flush of the produce buffer before leaving the kafka-confluent writer.

So If I am not mistaken, the call trace is:

eventlogging-processor -> for raw_event in get_reader(args.input): -> ValidationError -> write_event_error

The interesting bit in the ValidationError code path could be:

# If we successfully parsed the event, we have an Event object.
# Send it as a serialized JSON string in the EventError's raw_event.
event_string = event
if isinstance(event_string, Event):
    event_string = event.serialize()

serialize() does json.dumps and then encode():

def serialize(self):
    """Serializes this event as utf-8 JSON."""
    return json.dumps(self, sort_keys=True).encode('utf-8')

But the kafka confluent produce code does the same:

message_value = event.encode('utf-8') if raw else \
    json.dumps(event, sort_keys=True).encode('utf-8')

This could explain the problem. I am still trying to figure out why this doesn't show up in deployment-prep.

Change 543851 had a related patch set uploaded (by Elukey; owner: Elukey):
[eventlogging@master] eventlogging-processor: use e.args[0] instead of str(e)

https://gerrit.wikimedia.org/r/543851

Change 543851 merged by Elukey:
[eventlogging@master] eventlogging-processor: use e.args[0] instead of str(e)

https://gerrit.wikimedia.org/r/543851

@Ottomata this is surely unrelated, but should we upgrade librdkafka on eventlog1002?

elukey@eventlog1002:~$ apt-cache policy librdkafka1
librdkafka1:
  Installed: 0.11.5-1~bpo9+1
  Candidate: 0.11.6-1~bpo9+1+wikimedia1
  Version table:
     0.11.6-1~bpo9+1+wikimedia1 1001
       1001 http://apt.wikimedia.org/wikimedia stretch-wikimedia/main amd64 Packages
     0.11.6-1~bpo9+1 1001
        100 http://mirrors.wikimedia.org/debian stretch-backports/main amd64 Packages
 *** 0.11.5-1~bpo9+1 100
        100 /var/lib/dpkg/status
     0.9.3-1 500
        500 http://mirrors.wikimedia.org/debian stretch/main amd64 Packages

I am not sure whether we solved the decoding issue but * i think* we just need json.dumps(str(a,'utf-8')) in python3?

serialize() does json.dumps and then encode():
But the kafka confluent produce code does the same:

Those are actually two different things! Likely yes, the Kafka code should just call event.serialize instead of doing json.dumps .. itself. However, event_string = event.serialize() is wrapping the original event inside of an EventError event, as a json string. So, there shouldn't be any double encoding going on here.

I am not sure whether we solved the decoding issue but * i think* we just need json.dumps(str(a,'utf-8')) in python3?

This may lead to errors:

EDIT: Nuria's point made me realize that I got my json serialize tests wrong, since json.dumps() needs to get a dictionary (like Event) rather than a string. I'll redo my tests in light of this :)

>>> import json
>>> a = {'test': b'test1'}

>>> json.dumps(a).encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/json/__init__.py", line 230, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.5/json/encoder.py", line 198, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.5/json/encoder.py", line 256, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.5/json/encoder.py", line 179, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: b'test1' is not JSON serializable

>>> json.dumps(str(a)).encode()
b'"{\'test\': b\'test1\'}"'

Let's restart from the original exception:

Oct 17 10:18:09 eventlog1002 eventlogging-processor@client-side-01[11799]: 2019-10-17 10:18:09,154 [11799] (MainThread) root [ERROR] Unable to validate: <Event 95d07a84156554c18f015c538118adea of schema ('AdvancedSearchRequest', 18227136)> (Additional properties are not allowed ('sort' was unexpected))

Oct 17 10:18:09 eventlog1002 eventlogging-processor@client-side-01[11799]: 2019-10-17 10:18:09,155 [11799] (MainThread) root [WARNING] Force a flush of the produce buffer before leaving the kafka-confluent writer.

Oct 17 10:18:09 eventlog1002 eventlogging-processor@client-side-01[11799]: 2019-10-17 10:18:09,155 [11799] (MainThread) root [ERROR] Unable to create EventError object: 
b'{
  "dt": "2019-10-17T10:17:55Z", 
  "event": {"deepcategory": false, "filetype": false, "hastemplate": false, "inlanguage": false, "intitle": false, "not": false, "or": false, "phrase": false, "plain": false, "sort": false, "subpageof": false}, 
  "ip": "1.1.1.1", 
  "recvFrom": "cp3042.esams.wmnet", 
  "revision": 18227136, 
  "schema": "AdvancedSearchRequest", 
  "seqId": 279579402, 
  "userAgent": {"browser_family": "Chrome", "browser_major": "77", "browser_minor": "0", "device_family": "Other", "is_bot": false, "is_mediawiki": false, "os_family": "Windows", "os_major": "10", "os_minor": null, "wmf_app_version": "-"}, 
  "uuid": "XXXXXXX", 
  "webHost": "it.wikipedia.org",
  "wiki": "itwiki"}' 
  is not JSON serializable

The above bytes string is very different from the EvenError format:

return Event({
    'schema': ERROR_SCID[0],
    'revision': ERROR_SCID[1],
    'wiki': '',
    'uuid': '%032x' % uuid.uuid1().int,
    'recvFrom': socket.getfqdn(),
    'timestamp': int(round(time.time())),
    'event': {
        'rawEvent': raw_event,
        'message': error_message,
        'code': error_code,
        'schema': errored_schema,
        'revision': errored_revision
    }
})

So a possible theory is that rawEvent in the EventError object sent to kafka-confluent-writer contains a byte string (like indicated in the exception), and the json.dumps().encode() fails when trying to serialize it.

But this is in theory what it is meant to happen:

except ValidationError as e:
    logging.error('Unable to validate: %s (%s)', event, e.args[0])
    # If we successfully parsed the event, we have an Event object.
    # Send it as a serialized JSON string in the EventError's raw_event.
    event_string = event
    if isinstance(event_string, Event):
        event_string = event.serialize()

Change 544144 had a related patch set uploaded (by Elukey; owner: Elukey):
[eventlogging@master] event.py: avoid encoding a json string

https://gerrit.wikimedia.org/r/544144

https://gerrit.wikimedia.org/r/#/c/eventlogging/+/544144/ is my best guess about the problem in production, assuming that deployment-prep is currently not very representative of what traffic flows in prod. I deployed the patch to deployment-prep and eventlogging seems working fine.

Sounds good. Q: Do we even need the .encode('utf-8') in the kafka handler? Perhaps that is now unnecessary in Python 3?

Perhaps we should remove all .encode('utf-8') calls? Not sure.

Sounds good. Q: Do we even need the .encode('utf-8') in the kafka handler? Perhaps that is now unnecessary in Python 3?

I thought about it as well, but from https://docs.confluent.io/current/clients/confluent-kafka-python/#confluent_kafka.Producer.produce it seems that produce() requires a bytes string, so the encode() should be right. The other occurrences that I found were related to sending stuff to the network, so in theory they should be ok as well.

Left comments on patch

Given the

if isinstance(event_string, Event):

The Event is a dictionary (always, otherwise we do not serialize) but it might contain a key that is itself a byte string right?

Left comments on patch

Given the

if isinstance(event_string, Event):

The Event is a dictionary (always, otherwise we do not serialize) but it might contain a key that is itself a byte string right?

I didn't find any explicit encoding to a bytes string in the code, in theory it shouldn't happen. If this was happening then we'd get a stream of errors even when the event is properly validated (since it would trigger a double encoding in the kafka confluent writer) but in prod we saw issues only with EventErrors..

Change 544144 merged by Elukey:
[eventlogging@master] event.py: avoid encoding a json string

https://gerrit.wikimedia.org/r/544144

Deployment done, this time I don't see any error reported in logs or metrics!

Next steps:

  1. Check for a couple of days that the system is stable
  2. clean up python2 packages from puppet and the hosts that use it.

Change 546830 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] eventlogging::dependencies: absent python2 dependencies

https://gerrit.wikimedia.org/r/546830

Change 546830 merged by Elukey:
[operations/puppet@production] eventlogging::dependencies: absent python2 dependencies

https://gerrit.wikimedia.org/r/546830

Change 546837 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] eventlogging::dependencies: remove py2 dependencies

https://gerrit.wikimedia.org/r/546837

Change 546837 merged by Elukey:
[operations/puppet@production] eventlogging::dependencies: remove py2 dependencies

https://gerrit.wikimedia.org/r/546837

elukey set the point value for this task to 13.Oct 29 2019, 8:37 AM
elukey moved this task from In Progress to Done on the Analytics-Kanban board.