Page MenuHomePhabricator

Citoid automated monitoring times out due to Zotero v2
Closed, ResolvedPublic

Description

Ever since we switched to the new Zotero translation server, Citoid requests to Zotero time out occasionally. We should investigate this. Relevant grafana dashboard: https://grafana.wikimedia.org/dashboard/db/kubernetes-pods?panelId=32&fullscreen&orgId=1&from=now-16h&to=now

Event Timeline

mobrovac created this task.

As a first pass maybe we should update Zotero? It's a few months old now.

Also is this perhaps what was happening with T211148? (Looks okay now though.)

I can get Zotero to time out locally with 10.1098/rspb.2000.1188

Although if I skip resolving the DOI and use the url https://royalsocietypublishing.org/doi/abs/10.1098/rspb.2000.1188 I can get a response without timing out, although it takes 8 seconds.

Stashbot subscribed.

Mentioned in SAL (#wikimedia-operations) [2018-12-13T11:53:49Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@55fcd4b]: Remove restbase200[1-6], ensure body.tfa exists for feed responses and disable Citoid check - T211070 T211871 T211411

Mentioned in SAL (#wikimedia-operations) [2018-12-13T12:12:47Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@55fcd4b]: Remove restbase200[1-6], ensure body.tfa exists for feed responses and disable Citoid check - T211070 T211871 T211411 (duration: 18m 59s)

Change 483703 had a related patch set uploaded (by Mvolz; owner: Mvolz):
[mediawiki/services/zotero@master] Update Zotero; fix OOM issues

https://gerrit.wikimedia.org/r/483703

Change 483703 merged by jenkins-bot:
[mediawiki/services/zotero@master] Update Zotero; potentially fixes OOM issues

https://gerrit.wikimedia.org/r/483703

How are the timeouts looking since the redeploy on 2019-01-17? I'm having trouble interpreting the graph.

mobrovac edited projects, added serviceops; removed Patch-For-Review.

There have been no timeouts recorded by the automatic check scripts since the deploy, so looking good. Resolving for now, let's reopen if things change.

There have been no timeouts recorded by the automatic check scripts since the deploy, so looking good. Resolving for now, let's reopen if things change.

Perhaps because the checks were commented out and never re-enabled :D

@Pchelolo should we re-enable and see how it goes?

https://github.com/wikimedia/restbase/pull/1274