Page MenuHomePhabricator

Results of a test with 10 random DOIs from en.wiki on the Beta site
Closed, ResolvedPublic8 Estimated Story Points

Description

I took 10 random DOIs from en.wiki.

You can see the results here.

10.1038/scientificamerican0200-90 still not working. - due to 401 at website pointing to log-in only .pdf. Not sure there's much we can do here.
10.2307/3677029 still says "JSTOR: An Error Occurred Setting Your User Cookie" . Forked to T93877
10.1542/peds.2007-2362 still gives " Check date values in: |date= (help)" . Forked to T95016
Hope this helps.

Event Timeline

Elitre raised the priority of this task from to Needs Triage.
Elitre updated the task description. (Show Details)
Elitre added a project: Citoid.
Elitre subscribed.

https://gerrit.wikimedia.org/r/#/c/199921/ should help when it gets merged, but I will check all of these with that change and give an update.

@Elitre, changes to the DOI converter are live, would you mind rechecking these?

@Elitre, scratch that, they've been merged but aren't live yet :). I'll let you know when they are.

10.1038/scientificamerican0200-90 still not working.
10.2307/3677029 still says "JSTOR: An Error Occurred Setting Your User Cookie" .
10.1542/peds.2007-2362 still gives " Check date values in: |date= (help)" .

Note that you get the JSTOR error when using a standard JSTOR URL as well, like http://www.jstor.org/stable/25177324 - should that get a separate bug open?

The problem lies within CS1 templates like the Cite journal one, which do not accept dates with slashes, and is quickly fixed by replacing them with dashes (this is explained in https://en.wikipedia.org/wiki/Help:CS1_errors#bad_date , and it's good that the error message links to the explanation page, on en.wiki). So I don't know if this can be solved by changing the templates, or by having Citoid convert the date to an accepted format - I suspect the former. I can fork this task if Marielle says it's needed.

It's possible we've been IP blocked by JSTOR, see: T88323.

@Elitre, open up a ticket for the dates, we should be validating date fields.

Re: cookies issue, we have a ticket for that: T93877

The problem with the scientific american one is it resolves to a pdf that requires you to log-in to access. 10.1038/scientificamerican0200-90

http://www.nature.com/scientificamerican/journal/v282/n2/pdf/scientificamerican0200-90.pdf

That website gives us a 401 response code, so we won't try to scrape it. If it had resolved to an actual pdf, we wouldn't be able to scrape that either, because we can't scrape actual pdfs.

In the future we may try to scrape data from certain non-200 response codes and try to generate a citation anyway, but probably not now. You could try filing this one as a site specific issue, as all these dois are probably bad since they try to point to pdfs that you we can't scrape, but I doubt much progress will be made here. Scientific american should make their DOIs point to abstract pages instead of log-in only pdfs :).

Mvolz set Security to None.
Jdforrester-WMF claimed this task.
Jdforrester-WMF subscribed.

Checked all three DOIs; they now all work as expected.