Page MenuHomePhabricator

[SPIKE] Determine how specific we can be about logging why Citoid is failing
Open, Needs TriagePublic


As documented in T364594, there are 5 cases when Citoid can fail. [i]

This task involves the work of investigating the extent to which we can implement logging that would enable us – the team responsible for maintaining Citoid – to which of these 5 cases [i] is causing a given failure.


As a member of the team responsible for Citoid functioning in the way volunteers depend on it to, I need to know why Citoid is failing and the frequency with which Citoid is failing in this particular way so that I can determine how urgently we ought to prioritize a fix for said issue/improve the experience for people who are encountering it.

Open question(s)

  • 1. With what level of specificity can we log/track why Citoid is failing and the frequency with which it is failing in a particular way?
    • Where "log/track" here means doing so in a way that would enable us to generate a real-time graph similar to what we currently do.

Citoid failure cases

Copied from T364594

Failure caseDescriptionCapable of being logged?Logging implementation
JavaScript loaded pagesMetadata cannot be retrieved because JS is used to load the metadata and Citoid cannot interpret JSSite may simply be empty (contain no metadata) or report a 403.
Unsupported media type (.pdf, .mov) YesSee T214038, and T365583.
GDPR PagesDifficult to detect programmaticallyUser reported an issue with that here: T359059
Paywalls with no metadataCitoid is not able to access the metadata someone is requesting because the page is hosted behind a paywallTBD
IP BlockedThe publisher/entity hosting the content for which someone is seeking metadata has blocked the IP address from which Citoid is making a requestIf the server is returning 403, could report as "possibly blocked" however sometimes indistinguishable from being blocked because we don't use js/ 429 too many requests is a possibilityT364901
Wikipedia Library (or other library) proxyMetadata cannot be retrieved because Citoid is not authenticated to access content through the proxy.
ISBNErrors when Citoid cannot generate a citation for an ISBN YesT367163

Event Timeline

Moving this to Doing based on the idea that Editing Engineering is looking into this together, from different angles in parallel.

Mvolz updated the task description. (Show Details)
ppelberg updated the task description. (Show Details)

I think it's going to be hard to determine pro grammatically with some of these.

For instance I discovered a website today that gives 403 because it detects we don't have javascript/cookies enabled; this is not possible to distinguish problematically from a 403 where we've been IP blocked. We could report this as "website has blocked access" - but I worry that people might assume it's deliberate rather than they think we just can't navigate the site?