Page MenuHomePhabricator

Clients may need to generate a UUID for sending events.
Closed, ResolvedPublic

Description

We may opt to remove the meta.id property of analytics events, but for some other types of event this may not be true, or we may wish to revisit this policy. Currently MW provides no method for generating a client-side UUID, although it does provide, in mw.user.generateSessionId() a method to generate an 80 bit pseudorandom integer, represented as a string of hexadecimal digits. Two calls to this method would allow us to format a UUIDv4 value, or the generateSessionId method could be parametrized to make it easier to obtain the required number of bits of randomness in one call. All a UUIDv4 function would need to do then is to insert the '-' characters at the proper places.

Event Timeline

If anything, I think we should remove the format qualifier from the JSONSchema for this field. I think this id should be usable for whatever id the client thinks makes the most sense for their event.

Has meta.id been made an explicit requirement for all events sent to EventGate? I thought it was okay to have as an optional, to be used when the client is known to send duplicates (e.g. IE6)?

It is currently a requirement in the common schema's meta field. We could remove the requirement if we had to. But! We decided that we have to have EventGate augment events anyway, so we might as well set a meta.it uuid there if one is not provided by the client.

I thought it was okay to have as an optional, to be used when the client is known to send duplicates (e.g. IE6)?

A side note here that our javascript runtime has not supported browsers as old as this one for years. The oldest browser supported in the IE family is IE11 See: https://www.mediawiki.org/wiki/Compatibility#Browser_support_matrix
User agents with low IEX numbers are likely bot traffic.

I have nothing against UUIDs being sent from the client. Now, on my experience duplicated events due to browser bugs are rare. There was a notable case in FF . See: https://bugzilla.mozilla.org/show_bug.cgi?id=137976 but
more often than not, duplicates are caused by problems on instrumentation.

If anything, I think we should remove the format qualifier from the JSONSchema for this field. I think this id should be usable for whatever id the client thinks makes the most sense for their event.

I think this is a good idea for cases like the ones we're dealing with where there isn't really a need for de-duplication, and so rather than send well-formatted junk, we can just send junk. But I would sooner remove the requirement altogether rather than send junk at all, which is confusing.

We decided that we have to have EventGate augment events anyway, so we might as well set a meta.it uuid there if one is not provided by the client.

I have nothing against UUIDs being sent from the client. Now, on my experience duplicated events due to browser bugs are rare. There was a notable case in FF . See: https://bugzilla.mozilla.org/show_bug.cgi?id=137976 but
more often than not, duplicates are caused by problems on instrumentation.

Yes, I think as we've discussed plenty of times that it is a bit wasteful to have analytics clients generate UUIDs per-event. The sole use case that I've been informed of is the detection of duplicate events sent (usually) as the result of browser errors. Providing a server-side default generation of such a UUID would, obviously, not detect duplicates. I think the de-duplication problem can be handled as a QA issue downstream of EG, unless there are some numbers showing that it is a truly widespread issue.

The sole use case that I've been informed of is the detection of duplicate events sent (usually) as the result of browser errors.

Instrumentation errors, rather cause we really have no reports of browsers (in the new world) sending duplicate events other than pretty major bugs like the one linked above. The UBID is not going to help much with duplicates that originate from instrumentation issues cause they would "look" like different events with different Ids.

The sole use case

It also helps in troubleshooting problems when they happen. EventGate uses the id field for logging error messages as well, so it helps to be able to correlate logging messages to the events that cause the problem.

Providing a server-side default generation of such a UUID would, obviously, not detect duplicates

It would not detect duplicates created by a client producing the same event twice, but it does help with duplicates caused by other reasons; Kafka is at least once guarantee.

jlinehan moved this task from Inbox to Done! on the Better Use Of Data board.

Since we have made this field optional (or required with a default behavior of 'generate in EventGate'), there is no longer a use-case here for generating a UUID on the client, so I'm marking this as resolved.