Page MenuHomePhabricator

Parallelise pubmed requests to get IDs earlier in the request chain and skip it when the DOIs are scraped from the page (rarer occurrence.)
Closed, ResolvedPublic1 Estimated Story Points

Description

I am getting lots of time outs on VE because I think the Citoid extension times out before the service does.

i.e.

Suspiciously these are all journals, i.e.

http://www.apidologie.org/articles/apido/abs/2004/04/M4014/M4014.html
https://www.nature.com/nature/journal/v546/n7660/full/nature22375.html
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002947

I think this may be caused by the pubmed service/

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Mvolz renamed this task from Unable to make citation from http://www.apidologie.org/articles/apido/abs/2004/04/M4014/M4014.html to Timed out when requesting http://www.apidologie.org/articles/apido/abs/2004/04/M4014/M4014.html.Apr 13 2017, 1:44 PM
Mvolz closed this task as Invalid.
Mvolz reopened this task as Open.
Mvolz triaged this task as Medium priority.
Mvolz updated the task description. (Show Details)
Mvolz renamed this task from Timed out when requesting http://www.apidologie.org/articles/apido/abs/2004/04/M4014/M4014.html to Lots of time outs in Citoid extension.Jun 29 2017, 10:20 AM
Mvolz raised the priority of this task from Medium to High.
Mvolz added a project: Services.
Mvolz updated the task description. (Show Details)
Mvolz added a subscriber: mobrovac.

From IRC:

(10:01:49 AM) mvolz: [13:55:14] mobrovac: I think we need to chat about response times. I'm not sure how to go about fixing this but in the morning citoid was basically unuseable for me because the extension kept timing out: https://phabricator.wikimedia.org/T162886
(10:01:49 AM) mvolz: [13:55:25] It's a really bad user experience.
(10:01:49 AM) mvolz: [13:55:42] Is there a way maybe we can force citoid to respond within a certain amount of time?
(10:01:49 AM) mvolz: [13:55:55] That matches the time out of the extension?
(10:01:49 AM) mvolz: [13:56:46] I'm not sure how I would begin to go about doing that. 
(10:07:20 AM) mobrovac: mvolz: all the services in that DC alerted today, so it might not be citoid-specific
(10:07:34 AM) mvolz: hmm, okay :)
(10:08:35 AM) mobrovac: mvolz: but generally speaking, citoid's problem is mostly zotero, as it takes a long time to respond
(10:09:09 AM) mvolz: yeah, I'm not sure what proportion of that is coming from pubmed though
(10:09:35 AM) mvolz: we make a request to pubmed for every item that has a doi, in order to add a pmid to it
(10:09:43 AM) mvolz: I was thinking of configuring it to disable it.
(10:10:00 AM) mvolz: what do you think? Worth trying? Is there a way to benchmark this somehow?
(10:12:22 AM) mobrovac: mvolz: hm but if we disable that then we'll start getting complaints (again), as the expected functionality would not be there
(10:12:53 AM) mobrovac: mvolz: one idea could be to make parallel requests - zotero, htmldata and pubmed
(10:14:50 AM) mvolz: well, imo the pmcid and pmc when you already have a doi is extraneous. But I guess that's why I was wondering if we could benchmark. See if it saves us time
(10:15:03 AM) mvolz: The problem with parallelising it is that we can't always do that
(10:15:19 AM) mvolz: If we have the doi to begin with then possibly
(10:15:26 AM) mvolz: but if we get the doi from scraping not so much
(10:16:00 AM) mvolz: But yeah let's try paralellising it when we can instead of doing it in exporter maybe.?
(10:16:30 AM) mvolz: And then if we get the doi too late just skip it adding the pmid? 
(10:17:16 AM) mobrovac: sounds like a plan
(10:17:34 AM) mobrovac: if we get the doi late, that could be a config option
(10:17:40 AM) mobrovac: and then we play with it
Mvolz renamed this task from Lots of time outs in Citoid extension to Parallelise pubmed requests to get IDs earlier in the request chain and skip it when the DOIs are scraped from the page (rarer occurrence.).Jul 6 2017, 11:39 AM
Mvolz claimed this task.
Mvolz lowered the priority of this task from High to Medium.
Mvolz updated the task description. (Show Details)

Change 363593 had a related patch set uploaded (by Mvolz; owner: Marielle Volz):
[mediawiki/services/citoid@master] Try to improve performance with pubmed

https://gerrit.wikimedia.org/r/363593

Change 363593 merged by Mobrovac:
[mediawiki/services/citoid@master] Try to improve performance with pubmed

https://gerrit.wikimedia.org/r/363593

This is in production now, however, have we updated the config.yaml in production as well? Do we want to benchmark this at all? @mobrovac

Change 368946 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/citoid/deploy@master] Config: Do not always wait for PubMed requests to complete

https://gerrit.wikimedia.org/r/368946

Change 368946 merged by Mobrovac:
[mediawiki/services/citoid/deploy@master] Config: Do not always wait for PubMed requests to complete

https://gerrit.wikimedia.org/r/368946

Mentioned in SAL (#wikimedia-operations) [2017-07-31T22:32:48Z] <mobrovac@tin> Started deploy [citoid/deploy@7ad598d]: Do not wait for PubMed requests to complete - T162886

Mentioned in SAL (#wikimedia-operations) [2017-07-31T22:38:40Z] <mobrovac@tin> Finished deploy [citoid/deploy@7ad598d]: Do not wait for PubMed requests to complete - T162886 (duration: 05m 52s)

This is in production now, however, have we updated the config.yaml in production as well? Do we want to benchmark this at all? @mobrovac

{{done}}. Let's monitor the situation and see how it performs now.

Mvolz removed a project: Patch-For-Review.