Page MenuHomePhabricator

Flink Tables should have a default ROWTIME column.
Closed, ResolvedPublic

Description

A number of SQL functions (e.g. window aggregation) assume that the table has a timestamp column tagged as ROWTIME. AFAIK it’s something that can be specified either in the DDL or watermaking strategy.

Summary of a discussion I had with @Ottomata. This information could be provided:

  1. By the Catalog.
  2. In the defaults in EventTableDescriptorBuilder in setupKafka

We have some defaults in 2. for setting the watermar which probably should use the same column for ROWTIME. Default being the kafka timestamp

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

According to the doc we can declare a rowtime field either via DDL or when converting DataStream -> Table.

Eventutilities already has the capability to add a watermark column when building a TableDescriptor. The column carries "event time" Flink semantics (rowtime) as described in https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/concepts/time_attributes/.

For this reason I went with option 1 and propose a change to the Catalog's getTable() method to add watermark metadata at run time.
MR at https://gitlab.wikimedia.org/tchin/flink-wmf-event-catalog/-/merge_requests/1/edit.

For this reason I went with option 1 and propose a change to the Catalog's getTable() method to add watermark metadata at run time.

Makes sense!