Page MenuHomePhabricator

Citoid automated monitoring times out due to Zotero v2
Closed, ResolvedPublic

Description

Ever since we switched to the new Zotero translation server, Citoid requests to Zotero time out occasionally. We should investigate this. Relevant grafana dashboard: https://grafana.wikimedia.org/dashboard/db/kubernetes-pods?panelId=32&fullscreen&orgId=1&from=now-16h&to=now

Details

Related Gerrit Patches:
mediawiki/services/zotero : masterUpdate Zotero; potentially fixes OOM issues

Event Timeline

mobrovac triaged this task as High priority.Dec 7 2018, 10:32 AM
mobrovac created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 7 2018, 10:32 AM
Mvolz added a subscriber: Mvolz.Dec 7 2018, 2:21 PM
fsero claimed this task.Dec 10 2018, 10:24 AM
fsero moved this task from Backlog to In progress on the Prod-Kubernetes board.

As a first pass maybe we should update Zotero? It's a few months old now.

Also is this perhaps what was happening with T211148? (Looks okay now though.)

Mvolz moved this task from Backlog to Production on the Citoid board.Dec 11 2018, 11:31 AM
Mvolz added a comment.Dec 11 2018, 1:23 PM

I can get Zotero to time out locally with 10.1098/rspb.2000.1188

Although if I skip resolving the DOI and use the url https://royalsocietypublishing.org/doi/abs/10.1098/rspb.2000.1188 I can get a response without timing out, although it takes 8 seconds.

Stashbot added a subscriber: Stashbot.

Mentioned in SAL (#wikimedia-operations) [2018-12-13T11:53:49Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@55fcd4b]: Remove restbase200[1-6], ensure body.tfa exists for feed responses and disable Citoid check - T211070 T211871 T211411

Mentioned in SAL (#wikimedia-operations) [2018-12-13T12:12:47Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@55fcd4b]: Remove restbase200[1-6], ensure body.tfa exists for feed responses and disable Citoid check - T211070 T211871 T211411 (duration: 18m 59s)

Change 483703 had a related patch set uploaded (by Mvolz; owner: Mvolz):
[mediawiki/services/zotero@master] Update Zotero; fix OOM issues

https://gerrit.wikimedia.org/r/483703

Change 483703 merged by jenkins-bot:
[mediawiki/services/zotero@master] Update Zotero; potentially fixes OOM issues

https://gerrit.wikimedia.org/r/483703

How are the timeouts looking since the redeploy on 2019-01-17? I'm having trouble interpreting the graph.

mobrovac closed this task as Resolved.Jan 27 2019, 7:49 PM
mobrovac edited projects, added serviceops; removed Patch-For-Review.

There have been no timeouts recorded by the automatic check scripts since the deploy, so looking good. Resolving for now, let's reopen if things change.