The ORES uWSGI service is responsible for making IO requests to the MediaWiki API, and passes the resulting data to Celery workers for feature calculation and scoring. In order to pass the data, it's serialized as JSON and sent over the wire. I believe this data can be several megabytes, and we should investigate how long it takes to serialize and transmit to workers. Communication costs are often a bottleneck in parallel computing.
If this turns out to be a significant bottleneck, there may be alternatives such as shared memory IPC and making Celery pools local to each server.