Nature articles internal server error page getting scraped by Zotero and returning 200 status from Zotero
Open, Needs TriagePublic0 Estimated Story Points
Actions

Assigned To

None

Authored By

	Mvolz
	Jul 12 2015, 7:43 PM

Description

Originally reported by @Josve05a here: T1380#1448434

Related Objects

Mentioned In: T148320: Documenting process of writing Zotero translators through translation-servers
Mentioned Here: T1380: Nature.com articles gives citoid 401s.

Event Timeline

Mvolz created this task.Jul 12 2015, 7:43 PM

Mvolz raised the priority of this task from to Needs Triage.

Mvolz updated the task description. (Show Details)

Mvolz added a project: Citoid.

Mvolz moved this task to Site specific issues on the Citoid board.

Mvolz added subscribers: Mvolz, Josve05a.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 12 2015, 7:43 PM

Mvolz set Security to None.Jul 12 2015, 7:43 PM

Mvolz added a subscriber: • mobrovac.

Reported for translators here: https://github.com/zotero/translators/issues/918

@mobrovac maybe we should verify status at the location before sending to Zotero to avoid this sort of thing?

Josve05a added a project: Upstream.Jul 12 2015, 9:37 PM

In T105647#1448463, @Mvolz wrote:

@mobrovac maybe we should verify status at the location before sending to Zotero to avoid this sort of thing?

@Mvolz could you check that this is still an issue for us with the new, promisified version of Citoid? I don't think it should be.

Yes, I checked before reporting. Zotero gives us a 200 response with a
filled in citation; we only reject the promise if Zotero gives us a 200 and
an empty response. Otherwise we trust the 200.

Probably the best course of action would be to request results from Zotero and the resource itself in parallel. That way, if Zotero's results are not good, we already have a starting point for native scraping. Doing it in parallel also speeds things up.

Josve05a updated the task description. (Show Details)Oct 15 2016, 11:40 PM

Restricted Application added a project: VisualEditor. · View Herald TranscriptOct 15 2016, 11:40 PM

Soum213 mentioned this in T148320: Documenting process of writing Zotero translators through translation-servers.Oct 17 2016, 9:55 AM

Jdforrester-WMF moved this task from To Triage to External and Administrivia on the VisualEditor board.Nov 8 2016, 8:06 PM

Mvolz moved this task from Site specific issues to Zotero on the Citoid board.Jan 11 2017, 4:30 PM

Jdforrester-WMF set the point value for this task to 0.Feb 9 2017, 6:16 PM

@Mvolz, has this been resolved?

https://citoid.wikimedia.org/api?format=mediawiki&search=http://www.nature.com/ijo/journal/v38/n1/full/ijo201369a.html yields

[{"itemType":"journalArticle","notes":[],"tags":[],"title":"Perceived ‘healthiness’ of foods can influence consumers’ estimations of energy density and appropriate portion size","publicationTitle":"International Journal of Obesity","rights":"© 2013 Nature Publishing Group","volume":"38","issue":"1","pages":"106–112","date":"2014-01-01","DOI":"10.1038/ijo.2013.69","language":"en","url":"http://www.nature.com/ijo/journal/v38/n1/full/ijo201369a.html","abstractNote":"OBJECTIVE:\nMETHODS:\nRESULTS:\nCONCLUSIONS:","libraryCatalog":"www.nature.com","accessDate":"2017-03-14","author":[["G. P.","Faulkner"],["L. K.","Pourshahidi"],["J. M. W.","Wallace"],["M. A.","Kerr"],["T. A.","McCaffrey"],["M. B. E.","Livingstone"]],"source":["Zotero"]}]

https://github.com/zotero/translation-server/issues/15 is still open, though

Mvolz added a comment.Mar 14 2017, 2:14 PM

This comment was removed by Mvolz.

Note about what other websites do, google plus won't scrape it either: you get the error message "this link is not valid." Facebook lets you attach it though. Quora also does not. IME I think the user experience of FB is better than the G+.

We started not scraping error pages because we were sometimes sending back 404 not found errors to users, but not scraping a page with valid metadata is not ideal either. We might consider scraping the error pages again... thoughts?

• Deskana removed a project: VisualEditor.Aug 30 2018, 10:23 AM

Mvolz moved this task from Zotero to Service on the Citoid board.Dec 11 2018, 6:01 PM

Nature articles internal server error page getting scraped by Zotero and returning 200 status from ZoteroOpen, Needs TriagePublic0 Estimated Story PointsActions

Description

Related Objects

Event Timeline

Nature articles internal server error page getting scraped by Zotero and returning 200 status from Zotero
Open, Needs TriagePublic0 Estimated Story Points
Actions