Page MenuHomePhabricator

Modify EventLogging so that all table fields are nullable
Closed, ResolvedPublic5 Estimated Story Points

Description

Even if an EL schema says a field is required, EL should create a nullable column for that field when creating a new table, so that data purging to enforce data retention policies is possible.


Read more for context here:

Use case:
We need to apply data retention guidelines to EL database.
To do that, we're developing this script in T156933.
It uses this white-list to decide which tables and fields to purge.
There are 2 types of purge: full purge and partial purge. Full purge deletes the whole records after 90 days, whereas partial purge only sets to NULL the fields of the table that are not in the white-list (after 90 days as well).

Issue:
EL schemas allow to specify that a field is "required". EL interprets this and creates the corresponding table with the corresponding non-nullable field. This makes total sense, but prevents the purging script to set that field to NULL, in cases where the field is part of a partially purged schema.

Proposed solution:
Change EL so that all table fields are nullable, even if they are specified to be "required" in the schema definition. The "required" flag would not loose its meaning, because EL's event processor would still validate the event against the schema and check that the received event indeed possesses the required field. Only the database would be more "relaxed" in the way it stores the data, thus allowing the purging script to set values to NULL for all fields.

Derived tasks:

  • Modify EventLogging code so that all generated columns are nullable (THIS TASK)
  • Alter tables in EventLogging database to make all non-nullable columns nullable

Other potential solutions:

  1. Use a garbage value instead of NULL depending on field type, i.e.: 0 for numbers, false for booleans, '' for strings, etc. This is the easiest to implement, but it's super inconvenient for querying the tables, because you don't know whether a value is redacted or not.
  2. Do not allow schemas with required fields to be partially purged and force full purge. Also easy to implement, but super inconvenient for schema owners, because the would loose data that otherwise could be kept following the data retention guidelines.