Page MenuHomePhabricator

False positive testing
Closed, InvalidPublic

Description

https://docs.google.com/spreadsheets/d/1zRZeks0MrqfYFuvWp-IjmbytaVKHYyS97-vgvDCpMIg/edit?usp=sharing

Results from version using restricted filters but including citeseer and excluding academia.edu and researchgate

	New link	      Match	Copyvio	       SHERPA/RoMEO?

Error:
-Original not paywalled: 30 (51%)
-New link not free to read: 2 (3%)
-Original and new link do not match: 0 (0%)
-Copyright violation: 4 (7%)
-Sherpa/Romeo version wrong: 13 (22%)

Summary

  • A significant percentage of “paywalled” links turned out to actually be free to read already, via the original link
  • In at least one case the tool recommended adding the same link that was already present
  • In almost all cases the new link recommended by the tool was free to read, with intermittent problems accessing Citeseer’s cached PDFs
  • In at least two cases there were issues of a thesis of the same title being returned for a journal article citation. In another case a translation was returned for a citation to the original in another language.
  • The semi-automated tool demonstrated continued issues with Citeseer publications uploaded by people other than the original authors. One instance may be an unauthorized translation of the original.
  • I used SHERPA/RoMEO, an index of publisher copyright/archiving policies, to estimate compliance of included repositories with publisher contracts, keeping in mind that it is always possible the author negotiated terms other than those usually applied. A significant number of results in both the automated and semi-automated tests uploaded the wrong version (preprint vs postprint vs publisher version) or failed to meet other requirements uploaded by the publisher. Most often a publisher’s version was uploaded when this was not permitted. SHERPA-RoMEO has an API that may be possible to integrate
  • arXiv and PMC are both generally accepted by publishers as standard in those fields. No other source representing a significant portion of the test sets could be considered “perfect” in terms of likely copyright compliance.
  • Assessment of potential linkvio concerns will be challenging to non-experts even with written instructions.
  • The error page for mispelled/nonexistent titles could be more helpful
  • Using the tool manually (ie. entering an article title in the search box) was very slow, and the page occasionally did not load at all. In addition, a significant numbers of large journal-based articles present with no proposed changes