Page MenuHomePhabricator

Consider making a fast HEAD request to the actual resource before making a request to Citoid
Open, LowPublic

Description

As discussed by email, of the ~450k citations extracted, around 25% are returning a 404 "not found" response from Citoid.

However, as opposed to 400 errors (many of which we should be able to address with T301519), for most 404 errors we wouldn't know without actually trying to fetch the problematic URL.

To avoid unnecessary requests to Citoid (see T301510) we may consider making a fast HEAD request to the target URL to check if it exists before making the Citoid request.