Page MenuHomePhabricator

Sockpuppet API: Ensure data consistency across service instances
Open, Needs TriagePublic1 Estimated Story Points

Description

In the current data model, global data structures are used to hold state (input datasets),

These are mutated at run time, for example when new edit data becomes available. This means that these structures are both storage,
and a cache for hot changes. To the best of my knowledge, these caches are never flushed and would be regenerated at application restarts.

The use of globals in flask is not thread, nor process safe. Generally speaking, wsgi (like) serves will spawn multiple processes and the state won't be shared. We should ensure that, once the backend moves to a database, these updates are recorded and shared across the services hosts.
We should also make sure that re-runs of the ETL pipeline produce datasets consistent with the application caches (arguably, we should make the service stateless).