When using a Row, we can omit fields that we don't need by just not setting them in the first place.
Given an example schema:
test: type: string test_int: type: int
You can create some row
private final EventRowTypeInfo typeInfo... Row r = typeInfo.createEmptyRow(); r.setField("test", "test_string");
And when serialized results in
{ "test": "test_string" }
And only when this event is ingested into Hive does the unset columns get NULL.
But when using RowData, which directly mirrors the SQL schema, unset fields default to NULL in the first place, so an insert into the Flink catalog
INSERT INTO `example.schema` (`test`) VALUES ('test_string');
Results in
{ "test": "test_string", "test_int": null }
Which then fails the JSON schema validation before it can be sunk.
We either need to support removing null object nodes when generating the JSON from RowData, or make it a rule that users of the Flink Catalog must provide values for all fields.
This also calls into question how default values should be handled. Does someone omitting a field with a default value mean that they want it to have the default value, or that they want it to be NULL?