Page MenuHomePhabricator

Consider changing EventLogging to encode events using base64 instead of uriEncode
Closed, DeclinedPublic

Description

When talking about maximum length of EL events (because of URL length restriction) I remembered that Pivot/Turnilo uses base64 to encode json objects for the URL. The base64 event is actually larger in size than the raw json event (base64 increases in 4/3 the size of the original message). But EL is not sending the raw json, it sends the URI encoded version of it, which is (afaik) longer than the base64. See example (this event has been redacted):

The raw event: 1033 chars

{"dt": "2018-01-01T00:00:00Z", "event": {"action": "extClick", "dom_interactive_time": 1531275743841, "event_offset_time": 8946, "ext_position": 64, "freely_accessible": false, "in_infobox": false, "link_occurrence": 3, "link_text": "\"Some Text\"", "link_url": "https://some.host.org/path1/path2/path3", "mode": "desktop", "namespace_id": 0, "page_id": 16767922, "page_title": "Blah Blah Blah", "page_token": "kjhg395hg8745", "referrer": "https://www.google.com/", "revision_id": 845632768, "section_id": "Some_section", "session_token": "39hg4g8457h4545gh9", "skin": "vector"}, "ip": "11.111.11.11", "recvFrom": "cp5008.eqsin.wmnet", "revision": 13563478, "schema": "CitationUsage", "seqId": 6457542, "userAgent": {"browser_family": "Chrome", "browser_major": "100", "browser_minor": "100", "device_family": "Other", "is_bot": false, "is_mediawiki": false, "os_family": "Windows", "os_major": "10", "os_minor": null, "wmf_app_version": "-"}, "uuid": "4890tu394583498h4g9345yhguoi4", "webHost": "en.wikipedia.org", "wiki": "enwiki"}

The current URI encoded event: 1641 chars

%7B%22dt%22%3A%20%222018-01-01T00%3A00%3A00Z%22%2C%20%22event%22%3A%20%7B%22action%22%3A%20%22extClick%22%2C%20%22dom_interactive_time%22%3A%201531275743841%2C%20%22event_offset_time%22%3A%208946%2C%20%22ext_position%22%3A%2064%2C%20%22freely_accessible%22%3A%20false%2C%20%22in_infobox%22%3A%20false%2C%20%22link_occurrence%22%3A%203%2C%20%22link_text%22%3A%20%22%22Some%20Text%22%22%2C%20%22link_url%22%3A%20%22https%3A%2F%2Fsome.host.org%2Fpath1%2Fpath2%2Fpath3%22%2C%20%22mode%22%3A%20%22desktop%22%2C%20%22namespace_id%22%3A%200%2C%20%22page_id%22%3A%2016767922%2C%20%22page_title%22%3A%20%22Blah%20Blah%20Blah%22%2C%20%22page_token%22%3A%20%22kjhg395hg8745%22%2C%20%22referrer%22%3A%20%22https%3A%2F%2Fwww.google.com%2F%22%2C%20%22revision_id%22%3A%20845632768%2C%20%22section_id%22%3A%20%22Some_section%22%2C%20%22session_token%22%3A%20%2239hg4g8457h4545gh9%22%2C%20%22skin%22%3A%20%22vector%22%7D%2C%20%22ip%22%3A%20%2211.111.11.11%22%2C%20%22recvFrom%22%3A%20%22cp5008.eqsin.wmnet%22%2C%20%22revision%22%3A%2013563478%2C%20%22schema%22%3A%20%22CitationUsage%22%2C%20%22seqId%22%3A%206457542%2C%20%22userAgent%22%3A%20%7B%22browser_family%22%3A%20%22Chrome%22%2C%20%22browser_major%22%3A%20%22100%22%2C%20%22browser_minor%22%3A%20%22100%22%2C%20%22device_family%22%3A%20%22Other%22%2C%20%22is_bot%22%3A%20false%2C%20%22is_mediawiki%22%3A%20false%2C%20%22os_family%22%3A%20%22Windows%22%2C%20%22os_major%22%3A%20%2210%22%2C%20%22os_minor%22%3A%20null%2C%20%22wmf_app_version%22%3A%20%22-%22%7D%2C%20%22uuid%22%3A%20%224890tu394583498h4g9345yhguoi4%22%2C%20%22webHost%22%3A%20%22en.wikipedia.org%22%2C%20%22wiki%22%3A%20%22enwiki%22%7D

The base64 event: 1376 chars

eyJkdCI6ICIyMDE4LTAxLTAxVDAwOjAwOjAwWiIsICJldmVudCI6IHsiYWN0aW9uIjogImV4dENsaWNrIiwgImRvbV9pbnRlcmFjdGl2ZV90aW1lIjogMTUzMTI3NTc0Mzg0MSwgImV2ZW50X29mZnNldF90aW1lIjogODk0NiwgImV4dF9wb3NpdGlvbiI6IDY0LCAiZnJlZWx5X2FjY2Vzc2libGUiOiBmYWxzZSwgImluX2luZm9ib3giOiBmYWxzZSwgImxpbmtfb2NjdXJyZW5jZSI6IDMsICJsaW5rX3RleHQiOiAiIlNvbWUgVGV4dCIiLCAibGlua191cmwiOiAiaHR0cHM6Ly9zb21lLmhvc3Qub3JnL3BhdGgxL3BhdGgyL3BhdGgzIiwgIm1vZGUiOiAiZGVza3RvcCIsICJuYW1lc3BhY2VfaWQiOiAwLCAicGFnZV9pZCI6IDE2NzY3OTIyLCAicGFnZV90aXRsZSI6ICJCbGFoIEJsYWggQmxhaCIsICJwYWdlX3Rva2VuIjogImtqaGczOTVoZzg3NDUiLCAicmVmZXJyZXIiOiAiaHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8iLCAicmV2aXNpb25faWQiOiA4NDU2MzI3NjgsICJzZWN0aW9uX2lkIjogIlNvbWVfc2VjdGlvbiIsICJzZXNzaW9uX3Rva2VuIjogIjM5aGc0Zzg0NTdoNDU0NWdoOSIsICJza2luIjogInZlY3RvciJ9LCAiaXAiOiAiMTEuMTExLjExLjExIiwgInJlY3ZGcm9tIjogImNwNTAwOC5lcXNpbi53bW5ldCIsICJyZXZpc2lvbiI6IDEzNTYzNDc4LCAic2NoZW1hIjogIkNpdGF0aW9uVXNhZ2UiLCAic2VxSWQiOiA2NDU3NTQyLCAidXNlckFnZW50IjogeyJicm93c2VyX2ZhbWlseSI6ICJDaHJvbWUiLCAiYnJvd3Nlcl9tYWpvciI6ICIxMDAiLCAiYnJvd3Nlcl9taW5vciI6ICIxMDAiLCAiZGV2aWNlX2ZhbWlseSI6ICJPdGhlciIsICJpc19ib3QiOiBmYWxzZSwgImlzX21lZGlhd2lraSI6IGZhbHNlLCAib3NfZmFtaWx5IjogIldpbmRvd3MiLCAib3NfbWFqb3IiOiAiMTAiLCAib3NfbWlub3IiOiBudWxsLCAid21mX2FwcF92ZXJzaW9uIjogIi0ifSwgInV1aWQiOiAiNDg5MHR1Mzk0NTgzNDk4aDRnOTM0NXloZ3VvaTQiLCAid2ViSG9zdCI6ICJlbi53aWtpcGVkaWEub3JnIiwgIndpa2kiOiAiZW53aWtpIn0=

So for this specific case the base64 event is around 20% shorter than the current URI encoded one. The gain is not huge though, and we lose readability which is an important factor when grep-ing logs, although probably there's a solution for that. So, should we look deeper into this?

Event Timeline

fdans moved this task from Incoming to Smart Tools for Better Data on the Analytics board.

I do not think this is a possibility. base 64 encoding includes chars that are illegal on url encoding like "/". correct?

Base64 includes a-z, A-Z, 0-9, +, and /. So, all except / are 'legal'. I bet pivot/turnilo URI encode the base64 string to avoid problems with the /. This would add 2 extra chars for every / (it frequency being 1/64 in average), so a 3% increase. Theoretically, then, using base64+uriEncoding would be ~17% (not ~20%) shorter than using uriEncoding only.

Milimetric subscribed.

in favor of letting Modern Event Platform handle this