Some tasks have contained discussion about changes to ORES architecture. ORES is currently a collection of web nodes communicating with celery workers via celery. This architecture lets us handle several different use-cases with predictability.
- Realtime scoring of edits/revisions/pages as they are created/saved. (patrollers/bots)
- Historical scoring edits long after they are saved. (research/patroller/organizer)
- Batch processing of large amounts of edits/revisions/pages. (research/analytics)
Some issues have been raised about ORES current architecture:
- Redis SPOF: Redis is a SPOF
- MWAPI IO: IO via MWAPI calls takes a non-negligible amount of time
- Nonstandard API: ores.wikimedia.org is an independent API endpoint
Some proposals have been raised for improving the functionality of ORES.
- Feature store: Feature stores are becoming common in modern ML services/systems. We should invest in one.
- Stream architecture: We could decouple IO operations and CPU operations using Kafka or some other streaming architecture to move away from our worker pool/result store strategy.