Citoid automated monitoring times out due to Zotero v2
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• mobrovac
	Dec 7 2018, 10:32 AM

Description

Ever since we switched to the new Zotero translation server, Citoid requests to Zotero time out occasionally. We should investigate this. Relevant grafana dashboard: https://grafana.wikimedia.org/dashboard/db/kubernetes-pods?panelId=32&fullscreen&orgId=1&from=now-16h&to=now

Details

	Subject	Repo	Branch	Lines +/-
	Update Zotero; potentially fixes OOM issues	mediawiki/services/zotero	master	+1 K -298

Customize query in gerrit

Related Objects

Mentioned In: T211871: TFA missing from MCS response
T211070: decommission of restbase200[1-6] (lease return in December 2018)
T211148: QIDs work locally but not in production with new translation-server
Mentioned Here: T211070: decommission of restbase200[1-6] (lease return in December 2018)
T211871: TFA missing from MCS response
T211148: QIDs work locally but not in production with new translation-server

Event Timeline

• mobrovac triaged this task as High priority.Dec 7 2018, 10:32 AM

• mobrovac created this task.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 7 2018, 10:32 AM

Mvolz subscribed.Dec 7 2018, 2:21 PM

Mvolz mentioned this in T211148: QIDs work locally but not in production with new translation-server.Dec 9 2018, 10:06 AM

• fsero claimed this task.Dec 10 2018, 10:24 AM

• fsero moved this task from Backlog to In progress on the Prod-Kubernetes board.

As a first pass maybe we should update Zotero? It's a few months old now.

Also is this perhaps what was happening with T211148? (Looks okay now though.)

Mvolz moved this task from Backlog to Production on the Citoid board.Dec 11 2018, 11:31 AM

I can get Zotero to time out locally with 10.1098/rspb.2000.1188

Although if I skip resolving the DOI and use the url https://royalsocietypublishing.org/doi/abs/10.1098/rspb.2000.1188 I can get a response without timing out, although it takes 8 seconds.

Mentioned in SAL (#wikimedia-operations) [2018-12-13T11:53:49Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@55fcd4b]: Remove restbase200[1-6], ensure body.tfa exists for feed responses and disable Citoid check - T211070 T211871 T211411

Mentioned in SAL (#wikimedia-operations) [2018-12-13T12:12:47Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@55fcd4b]: Remove restbase200[1-6], ensure body.tfa exists for feed responses and disable Citoid check - T211070 T211871 T211411 (duration: 18m 59s)

Change 483703 had a related patch set uploaded (by Mvolz; owner: Mvolz):
[mediawiki/services/zotero@master] Update Zotero; fix OOM issues

https://gerrit.wikimedia.org/r/483703

gerritbot added a project: Patch-For-Review.Jan 11 2019, 10:09 AM

Change 483703 merged by jenkins-bot:
[mediawiki/services/zotero@master] Update Zotero; potentially fixes OOM issues

https://gerrit.wikimedia.org/r/483703

How are the timeouts looking since the redeploy on 2019-01-17? I'm having trouble interpreting the graph.

• Phabricator_maintenance moved this task from Backlog to Acknowledged on the SRE board.Jan 26 2019, 10:51 PM

There have been no timeouts recorded by the automatic check scripts since the deploy, so looking good. Resolving for now, let's reopen if things change.

In T211411#4913374, @mobrovac wrote:

There have been no timeouts recorded by the automatic check scripts since the deploy, so looking good. Resolving for now, let's reopen if things change.

Perhaps because the checks were commented out and never re-enabled :D

@Pchelolo should we re-enable and see how it goes?

https://github.com/wikimedia/restbase/pull/1274

Citoid automated monitoring times out due to Zotero v2Closed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

Citoid automated monitoring times out due to Zotero v2
Closed, ResolvedPublic
Actions