This [[ https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/804614 | change ]] implements conversions from Event Platform event JSONSchemas to Flink types in both the Table and DataStream APIs.
That change did not implement any conversion from string type to date-time timestamps.
JSONSchema represents this with [[ https://json-schema.org/understanding-json-schema/reference/string.html#dates-and-times | format: date-time ]].
[[ https://datatracker.ietf.org/doc/html/rfc3339#section-5.6 | JSONSchema date-time format ]] supports timezone-full timestamps, and Event Platform specifices that we prefer these date-times in UTC 'Z' timezone format, e.g. "2022-05-01T00:00:00Z". JSONSchema date-time will also validate with timezone offsets e.g. "2022-05-01T00:00:00-05:00Z".
As [[ https://github.com/apache/flink/blob/master/flink-formats/flink-json/src/main/java/org/apache/flink/formats/json/JsonRowDeserializationSchema.java#L498-L514 | far as I can tell ]], Flink does not really support string date-times with timezone info. It suppports timezone-less, or local-timezone, the semantics differeing only in that local-timezone date-times are stored as UTC timestamps and presented in local time depending on the Flink `table.local-time-zone` setting.
Our event data hopefully will have all date-times in 'Z' UTC format, and using local-timezone Flink timestamps will usually be the right thing to do. However, it is possible, especially in client side submitted instrumentation event data, for date-times to come in with timezone offsets. If we always convert date-time to Flink local-timezone timestamps, these will fail conversion.
I'm not sure of the right thing to do here. We could just keep JSONSchema date-time fields as strings and let users in flink deal with conversion to timestamp types where needed. It would be nice if this was automated though.
Related: {T278467}