Page MenuHomePhabricator

Event Logging doesn't seem to handle unicode strings {oryx} [8 pts]
Closed, ResolvedPublic

Description

While deploying we found by accident warnings in the mysql consumer log that say "Invalid string" followed by what look like unicode characters (eg. \xF0)

Event Timeline

Milimetric raised the priority of this task from to Needs Triage.
Milimetric updated the task description. (Show Details)
Milimetric added subscribers: Milimetric, mforns.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 18 2015, 8:08 PM

We could write a unit test to validate this is true/false.
Is it at the processor or consumer level?

Need to try to replicate error and fix if it exists.

kevinator renamed this task from Event Logging doesn't seem to handle unicode strings to Event Logging doesn't seem to handle unicode strings {oryx} [8 pts].May 26 2015, 6:58 PM
kevinator set Security to None.
kevinator triaged this task as Normal priority.Jun 2 2015, 10:05 PM
ggellerman moved this task from Incoming to Tasked on the Analytics-Backlog board.
kevinator raised the priority of this task from Normal to High.Jun 29 2015, 5:33 PM
kevinator moved this task from Tasked to Medium on the Analytics-Backlog board.
kevinator moved this task from Medium to Prioritized on the Analytics-Backlog board.
Milimetric edited projects, added Analytics-Kanban; removed Analytics-Backlog.
Milimetric moved this task from Tasked_Hidden to In Progress on the Analytics-Kanban board.

Steps taken to look for the problem:

  • looked through past 7 days of logs for "Invalid string" and found nothing
  • looked through git history around the time this bug was filed, checked the mysql handler and found no differences that would explain lack of logging
  • made an input file with äê â﻾ain it and fed it to the consumer, it barfed with:
Traceback (most recent call last):
  File "eventlogging-consumer", line 53, in <module>
    eventlogging.drive(args.input, args.output)
  File "/home/otto/EventLogging/server/eventlogging/factory.py", line 123, in drive
    for event in reader:
  File "/home/otto/EventLogging/server/eventlogging/streams.py", line 100, in <genexpr>
    return (json.loads(dgram) for dgram in iter_unicode(stream))
  File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 1 column 615 (char 614)
  • The above error makes sense, because literal tabs are not allowed in JSON. I removed the tab and left the unicode and weirdo characters in there, and everything was fine.

Conclusion: EL consumer is fine and this bug does not appear to be happening any more.

Milimetric moved this task from In Progress to Done on the Analytics-Kanban board.Jul 29 2015, 8:09 PM
kevinator closed this task as Resolved.Jul 31 2015, 3:22 PM