Currently the WikibaseQualityConstraints stores data in a memcached cache, but there is no guarantee of how long that data will remain as it can be evicted.
In order to make data more persistent we want to store it in it's own sql DB table.
The schema could probably be something like ( entityId, timestampLastUpdated, blob ).
TBA how much data will this currently add to the wikidatawiki DB?
TBA Can we use Cassandra for the actual blob storage? just keeping the index of entities and dates in sql?
What will this allow
- The query service wants to be able to find constraint violations (T192565), so all entities will have to have constraint checks run (T201150) in order to have a complete set of data. There is no point in running the constraint check multiple times (if it drops out of the cache) hence the DB table.
- This will allow dumps of all entity constraint checks which will enable easy reloading of data into a query service server without hitting the API a bunch.
- The 'timestampLastUpdated' field will allow wikibase to inspect the oldest constraint check data and re run the checks for very old data points.
Acceptance Criteria
- Configuration switch to change between memcached storage and DB table storage (default to memcached)
- The DB table will have a primary key
- The DB schema should have DBA review once the sql is somewhere that they can see it (follow https://wikitech.wikimedia.org/wiki/Schema_changes#Advice_on_schema_changes)