Page MenuHomePhabricator

Save requested URLs for Citoid's response
Open, Needs TriagePublic

Description

Citoid's API generates automatic citations given a URL (or other identifiers such as DOI, ISBN, etc.). The responses from Citoid contain a "url" key but the data under that key not always corresponds to the requested URL, sometimes it contains the URL of a redirection, sometimes the canonical URL.

For the evaluation of Citoid's performance for Web2Cit, we compare the metada returned by Citoid for a given URL against the metadata from a manual citation for the same URL. However when we requested the citation metadata to Citoid we didn't know that the URL in the response is not always the requested URL. We tried to recover the requested URL for each automatic citation but the results aren't entirely successful. This is why, we performed the evaluation on a subset of 91k citations where we are sure that the recovered requested URL matches the URL in Citoid's response.

To be able to perform the evaluation using the entire dataset of valid manual citations (see T301519), we should perform the request of the ~ 380k URLs to Citoid again saving the requested URL for each response.