Page MenuHomePhabricator

Science direct translator displays unusual behaviour (occasionally gets stuck in redirect loop).
Closed, ResolvedPublic1 Estimated Story Points

Description

DOIs from Elsevier journals (some? all?) don't seem to produce useful citations.

This is because the ScienceDirect translator is not enabled for translation-server: https://github.com/zotero/translators/blob/master/ScienceDirect.js

'v' needs to be added to "browserSupport"

Event Timeline

Whatamidoing-WMF raised the priority of this task from to Needs Triage.
Whatamidoing-WMF updated the task description. (Show Details)
gpaumier set Security to None.
gpaumier subscribed.

@Whatamidoing-WMF Could you include some of the DOIs you tested as a starting point?

10.1016/j.cagd.2005.06.005 was reported as failing. I've seen no success reports for Elsevier, and I've asked for more information on wiki.

Jdforrester-WMF raised the priority of this task from High to Needs Triage.
Jdforrester-WMF triaged this task as High priority.
Jdforrester-WMF moved this task from Backlog to Site specific issues on the Citoid board.

Sample Elsevier DOIs (that don't work)
10.1016/j.entcs.2011.01.004
10.1016/S0140-6736(04)15715-2
10.1016/S1473-3099(10)70143-2
10.1016/S2213-8587(15)00033-9
10.1016/j.celrep.2015.05.024
10.1016/j.tics.2007.09.009

Mvolz renamed this task from Elsevier journals don't seem to work in Citoid to Enable science direct translator.Jun 20 2015, 7:08 PM
Mvolz renamed this task from Enable science direct translator to Enable science direct translator (if working).
Mvolz claimed this task.
Mvolz updated the task description. (Show Details)
Mvolz added a subscriber: mobrovac.

Change 219590 had a related patch set uploaded (by Mvolz):
Enable ScienceDirect translator

https://gerrit.wikimedia.org/r/219590

Change 219590 merged by Mobrovac:
Enable ScienceDirect translator

https://gerrit.wikimedia.org/r/219590

The translator has been enabled and tested to work. The patch is going live tomorrow, 2015-06-22

Deployed, resolving.

Well, crap. It works on my master + localhost. What version of citoid is in
deployment?

Reverted back for now as it causes Citoid to crash. Further investigations needed.

Well, crap. It works on my master + localhost. What version of citoid is in
deployment?

https://gerrit.wikimedia.org/r/#/c/210037/ is currently deployed. I checked it out locally and tried to reproduce the error, but no luck.

mobrovac raised the priority of this task from High to Unbreak Now!.
mobrovac added a subscriber: Mvolz.

Assigning to myself as I'll try to investigate further and/or hot-patch the deployment version if needed

It seems ScienceDirect translations trigger a chain on events. First, Zotero tries to retrieve the page, but a failure happens on the Squid proxy - P821 . However, this is not declared as an error (way to go, Zotero!) because Squid returns an HTML document, which in turn is scraped by the ScienceDirect translator. Normally, all of the received lines are discarded as invalid, but magically the translation is still declared as successful. This causes Zotero to return nothing, which finally breaks Citoid.

The last step belongs to work to be done in T101878: Improve Zotero Validation. As for the proxy issues, I'll try to debug them more tomorrow with Alex.

I have debugged this a bit more. I ran tests from sca1001 to check the responses, and unless I set the Cookie header I get the correct response. curl-ing with the exact same cookies as seen in the zotero logs triggers a ERR_INVALID_REQ from Squid. Using the exact same request from localhost succeeds.

@akosiaris can we somehow tell Squid to ignore the line length in the headers? Or, ideally, add that as an exception only for ScienceDirect URLs?

Change 220140 had a related patch set uploaded (by Alexandros Kosiaris):
url_downloader: Increase request body/header size

https://gerrit.wikimedia.org/r/220140

Change 220140 merged by Alexandros Kosiaris:
url_downloader: Increase request body/header size

https://gerrit.wikimedia.org/r/220140

Squid proxy patch checked and works. I redeployed the translator activation patch and confirmed it is now working in production.

For some reason I couldn't add all the DOIs listed in Quiddity's comment in the same edit (it would still show me the error message).
With these edits https://en.wikipedia.org/w/index.php?title=User%3AElitre_%28WMF%29%2FSandbox&type=revision&diff=668298608&oldid=668298042 I managed to add 4 of those (I think 10.1016/j.celrep.2015.05.024 and 10.1016/j.tics.2007.09.009 are the ones still failing?).

Reopening. Yet again.

For some reason I couldn't add all the DOIs listed in Quiddity's comment in the same edit (it would still show me the error message).
With these edits https://en.wikipedia.org/w/index.php?title=User%3AElitre_%28WMF%29%2FSandbox&type=revision&diff=668298608&oldid=668298042 I managed to add 4 of those (I think 10.1016/j.celrep.2015.05.024 and 10.1016/j.tics.2007.09.009 are the ones still failing?).

For the latter, Citoid seems to be returning a valid refernce - https://citoid.wikimedia.org/api?format=mediawiki&search=10.1016%2Fj.tics.2007.09.009

The former does not. The first time I tried it, Citoid returned a ref for which Zotero clearly failed to extract info. Now, however it seems to be having the same symptoms, I'm getting a 503 error. Yet again.

Change 220190 had a related patch set uploaded (by Mobrovac):
Hot-fix for production:

https://gerrit.wikimedia.org/r/220190

mobrovac lowered the priority of this task from Unbreak Now! to High.Jun 23 2015, 6:09 PM

I have deployed a hot fix into production which should keep Citoid from crashing for now if it receives an empty citation from Zotero even though Zotero declared it a successful translation (which is madness per se).

The problem is far from gone, unfortunately. The real reason why I was able is to get a proper citation for some DOIs while @Elitre didn't is, unfortunately, luck (as far-fetched as it may seem).

Some requests to ScienceDirect cause it to enter redirect hell where over 70 redirects need to be made in order to reach the destination. Moreover, the same DOI may or may not cause that, which suggests this depends both on the state of the system and the resource itself (the DOI in question).

To conclude, for now Ctioid is stable and may return the correct citation. I'm keeping the ticket open since these redirects need further investigation.

Mvolz renamed this task from Enable science direct translator (if working) to Science direct translator displays unusual behaviour (occasionally gets stuck in redirect loop)..Jul 12 2015, 6:41 PM

@mobrovac I think the redirects are likely a bug in Zotero proper, no? Maybe this should be resolved and just filed upstream?

Mvolz lowered the priority of this task from High to Medium.Jul 12 2015, 6:42 PM
mobrovac changed the task status from Open to Stalled.Jul 13 2015, 11:20 AM

@mobrovac I think the redirects are likely a bug in Zotero proper, no?

Most likely it's a cookie-handling problem in the translation server itself, not Zotero. Also, a thing that might contribute to its malfunctioning is us using a proxy. Not sure if that affects the SD server in any way.

Maybe this should be resolved and just filed upstream?

Mh, not sure filing a bug there will amount to much given the activity on that repo ceased a year ago, but, sure, go ahead :) Let's keep this ticket open though to keep track of the issue. Putting it as stalled.

Change 226318 had a related patch set uploaded (by Mobrovac):
Use the same cookie jar throughout a request's life

https://gerrit.wikimedia.org/r/226318

Change 226318 merged by jenkins-bot:
Use the same cookie jar throughout a request's life

https://gerrit.wikimedia.org/r/226318

This should be now fixed in the latest deployed version of Citoid, so resolving. Feel free to reopen if more problems are encountered.