Ever since we switched to the new Zotero translation server, Citoid requests to Zotero time out occasionally. We should investigate this. Relevant grafana dashboard: https://grafana.wikimedia.org/dashboard/db/kubernetes-pods?panelId=32&fullscreen&orgId=1&from=now-16h&to=now
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Update Zotero; potentially fixes OOM issues | mediawiki/services/zotero | master | +1 K -298 |
Related Objects
- Mentioned In
- T211871: TFA missing from MCS response
T211070: decommission of restbase200[1-6] (lease return in December 2018)
T211148: QIDs work locally but not in production with new translation-server - Mentioned Here
- T211070: decommission of restbase200[1-6] (lease return in December 2018)
T211871: TFA missing from MCS response
T211148: QIDs work locally but not in production with new translation-server
Event Timeline
As a first pass maybe we should update Zotero? It's a few months old now.
Also is this perhaps what was happening with T211148? (Looks okay now though.)
I can get Zotero to time out locally with 10.1098/rspb.2000.1188
Although if I skip resolving the DOI and use the url https://royalsocietypublishing.org/doi/abs/10.1098/rspb.2000.1188 I can get a response without timing out, although it takes 8 seconds.
Mentioned in SAL (#wikimedia-operations) [2018-12-13T11:53:49Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@55fcd4b]: Remove restbase200[1-6], ensure body.tfa exists for feed responses and disable Citoid check - T211070 T211871 T211411
Mentioned in SAL (#wikimedia-operations) [2018-12-13T12:12:47Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@55fcd4b]: Remove restbase200[1-6], ensure body.tfa exists for feed responses and disable Citoid check - T211070 T211871 T211411 (duration: 18m 59s)
Change 483703 had a related patch set uploaded (by Mvolz; owner: Mvolz):
[mediawiki/services/zotero@master] Update Zotero; fix OOM issues
Change 483703 merged by jenkins-bot:
[mediawiki/services/zotero@master] Update Zotero; potentially fixes OOM issues
How are the timeouts looking since the redeploy on 2019-01-17? I'm having trouble interpreting the graph.
There have been no timeouts recorded by the automatic check scripts since the deploy, so looking good. Resolving for now, let's reopen if things change.
Perhaps because the checks were commented out and never re-enabled :D
@Pchelolo should we re-enable and see how it goes?