This task involves the work of enhancing Citoid's logging (Grafana) so that we have access to which domains are failing most frequently.
Story
As a member of the team responsible for Citoid functioning in the way volunteers depend on it to, I need to know the frequency with which people are attempting to cite content from specific domains and that rate at which those domains are failing/succeeding, so that I can prioritize which organizations to prioritize contacting to resolve the failures I'm observing.
Requirements
- Add logging to Citoid that records which domain is being requested and whether reference data was successfully generated
- Present the data in a place/format (e.g. Grafana) where we can monitor and investigate error rates
Background
In T362379 we learned several major news websites (NYT, NPR, Reuters...) block Citoid.
At present, we (the maintainers of Citoid) do not know which domains are failing most frequently. As a result, we're not able to determine which publishers we ought to prioritize contacting to address these failures.
QA
Editing Engineering to QA this as early as 27 June 2024.
TODO
- Bugfix so logs are serialised correctly [i]
- Index outgoingReqResult fields in logstash (how is this done?)
i. https://gerrit.wikimedia.org/r/c/mediawiki/services/citoid/+/1046713