Page MenuHomePhabricator

Log data about which domains are failing most frequently
Closed, ResolvedPublic

Description

This task involves the work of enhancing Citoid's logging (Grafana) so that we have access to which domains are failing most frequently.

Story

As a member of the team responsible for Citoid functioning in the way volunteers depend on it to, I need to know the frequency with which people are attempting to cite content from specific domains and that rate at which those domains are failing/succeeding, so that I can prioritize which organizations to prioritize contacting to resolve the failures I'm observing.

Requirements

  1. Add logging to Citoid that records which domain is being requested and whether reference data was successfully generated
  2. Present the data in a place/format (e.g. Grafana) where we can monitor and investigate error rates

Background

In T362379 we learned several major news websites (NYT, NPR, Reuters...) block Citoid.

At present, we (the maintainers of Citoid) do not know which domains are failing most frequently. As a result, we're not able to determine which publishers we ought to prioritize contacting to address these failures.

QA

Editing Engineering to QA this as early as 27 June 2024.

TODO

  • Bugfix so logs are serialised correctly [i]
  • Index outgoingReqResult fields in logstash (how is this done?)

i. https://gerrit.wikimedia.org/r/c/mediawiki/services/citoid/+/1046713

Event Timeline

Per discussion in standup, we'd like to log all requests that are going out as well as the errors that are returned.

Adding === Requirements based on what resulting in today's Editing Engineering discussions and what @Esanders shared with me offline.

Per today's team meeting, Editing Engineering will share an update about the status of this work tomorrow (Tuesday).

Change #1036250 had a related patch set uploaded (by Mvolz; author: Mvolz):

[mediawiki/services/citoid@master] Log HTTP errors on warn level

https://gerrit.wikimedia.org/r/1036250

Change #1041595 had a related patch set uploaded (by Mvolz; author: Mvolz):

[mediawiki/services/citoid@master] [WIP] Further improve debug level logging

https://gerrit.wikimedia.org/r/1041595

Change #1036250 merged by jenkins-bot:

[mediawiki/services/citoid@master] Log HTTP errors on warn level

https://gerrit.wikimedia.org/r/1036250

Editing Engineering will QA this work by way of looking at logs in ~2 weeks from Thursday, 13 June 2024.

Change #1042241 had a related patch set uploaded (by Mvolz; author: Mvolz):

[mediawiki/services/citoid@master] Avoid logging full err for warns

https://gerrit.wikimedia.org/r/1042241

Mvolz updated the task description. (Show Details)
ppelberg renamed this task from Ensure access to data about which domains are failing most frequently to Log data about which domains are failing most frequently.Jun 13 2024, 4:53 PM

Change #1046713 had a related patch set uploaded (by Mvolz; author: Mvolz):

[mediawiki/services/citoid@master] Fix type errors in bunyan serialisers

https://gerrit.wikimedia.org/r/1046713

Change #1047954 had a related patch set uploaded (by Mvolz; author: Mvolz):

[operations/deployment-charts@master] Log at info level for citoid

https://gerrit.wikimedia.org/r/1047954

Change #1042241 merged by jenkins-bot:

[mediawiki/services/citoid@master] Avoid logging full err for warns

https://gerrit.wikimedia.org/r/1042241

Change #1047954 merged by jenkins-bot:

[operations/deployment-charts@master] Log at info level for citoid

https://gerrit.wikimedia.org/r/1047954

Change #1041595 merged by jenkins-bot:

[mediawiki/services/citoid@master] Improve debug level logging

https://gerrit.wikimedia.org/r/1041595