|Resolved||Mvolz||T93785 DOI lookup returns a scraped "missing cookie" page instead of desired content|
|Resolved||Mvolz||T93876 Restructure requestFromDOI|
What's going on here is that we're following all the redirects from the doi link (as a way of accessing the actual link) so we're directed to the correct link- so we get http://www.nejm.org/doi/full/10.1056/NEJM200106073442306 - which then doesn't like citoid's lack of cookie support, which then redirects us again to http://www.nejm.org/action/cookieAbsent where the page is correctly scraped- just wasn't what we wanted- so no 520.
One possible solution: only follow ONE redirect for DOI links. This is somewhat risky as sometimes DOIs point to genuine redirects (I'm thinking of plos that does this quite commonly).
The other bad thing about this is that there was not HTTP error for the cookie absent redirect- there might have been such an error in the intermediate connection, needs more investigation. Whether to follow redirects or not is something that gets mixed results from site to site. One solution is to simply follow one redirect at a time and see what the response is like- follow until we get don't get http errors maybe?
Following only one redirect is too tricky to be put in practice, IMHO. There might be various reasons why a redirect happened, even for safe or verified sites.
How about following redirects and setting cookies? Is that undesirable for some reason?
Yeah, I think it's safe to follow one redirect for DOIs, try Zotero, and then follow all redirects after the fact. Or at least that's my current plan, see the task I merged this with (maybe I should unmerge and then put both as blocking for this one?).
Yes, we should add cookie support.