The NIH database seems to have a lot of problems lately. It takes a long time to connect to their servers, and it also takes a fair amount of time for it to answer requests. This causes Citoid to queue up incoming connections which causes new connections to time out as experienced by our monitoring:
icinga-wm: PROBLEM - citoid endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
Note that this is a combination of both Zotero and NIH taking their time to respond as well the amount of requests done by Citoid itself.
Because the NIH db is the prime source for PM(C) IDs, we need to keep it around, but find a way limit its impact on Citoid. An obvious idea that comes to mind is to lower the TCP socket connection time-out, but AFAIK we can change that only system-wide which could have unwanted consequences. Any ideas?