Page MenuHomePhabricator

Calculate rate at which URL requests fail and succeed
Open, Needs TriagePublic

Description

Within the the Citoid logstash dashboard there is data showing the top errors by requested domains.

This task seeks to help us better understand URL requests made through Citoid by calculating the overall rate at which requests of this sort fail.

Longer term, we'd like to be able to calculate the failure/success rates by other requests (e.g. ISBN, DOI, PubMed ID, etc.). Tho, doing so would require instrumentation we've not yet added. Ticket needed

Knowing the above will help the Editing Team ensure we are allocating our attention to addressing the most prevalent failures.

Requirements

  1. Data showing rates at which Citoid URL requests fail and succeed within the following time intervals:
Last 24 hours

Excluding PDFs:

All formats: 80.6% success
No zotero format: 86.8% success
Only zotero format: 79.0 % success

Including PDFs:
794 urls ending in .pdf out of 93785+794 requests total= .8% of requests.

https://logstash.wikimedia.org/goto/5c8ad02c549621b486fbe96095109a82

Last 1 week

Excluding PDFs

All formats: 76.4 % success
No zotero format: 86.2% success
Only zotero format: 75% success

Last month

Excluding PDFs

All formats: 76.1%
No zotero format: 87.3%
Only zotero format: 74.7%

All formats an input type, including pdfs (metrics):
83.7 % success

Last 6 months

TODO

  1. Data "1." describes needs to be filter-able by request origin (Wikimedia and External/3rd Party)

Event Timeline

Mvolz updated the task description. (Show Details)

I've added some stats here.

I'm looking into getting some longer term data from metrics.

To get the above stats, I've added some new panels to the dashboard:

"Outgoing requests by status" in the lower right hand corner reports the percentage which are 200, so this is our "success" rate, and the failure rate is just 100% minus that.

outgoing.png (433×950 px, 32 KB)

Unfortunately this includes some pdfs are "succeeding" because some pdfs return us a 200, so for the above data I've filtered out the pdfs in a hacky way by excluding *.pdf on the request url.

I've added some stats here.

Wonderful.

I'm looking into getting some longer term data from metrics.

Excellent.

To get the above stats, I've added some new panels to the dashboard:

"Outgoing requests by status" in the lower right hand corner reports the percentage which are 200, so this is our "success" rate, and the failure rate is just 100% minus that.

outgoing.png (433×950 px, 32 KB)

Unfortunately this includes some pdfs are "succeeding" because some pdfs return us a 200, so for the above data I've filtered out the pdfs in a hacky way by excluding *.pdf on the request url.

@mvols: 2 questions in response:

  1. What field did you filter out *.pdf from? [i]
  2. With .pdf approximately excluded, are there any other media types that we ought to try to exclude such that the metrics Outgoing requests by status return are specific to URLs? [ii]

And hey! Thank you for sharing how you arrived to these metrics. You doing so helps equip me with the know-how I need to use this dashboard to ask and answer questions of Citoid independently.


i. This is the interface I'm assuming you used to create the filter you described above:

image.png (764×1 px, 109 KB)

ii. One thought: maybe the effort required to do this is not worthwhile considering how minimal we assume the traffic to be for formats like .mov?

I've added some stats here.

Wonderful.

I'm looking into getting some longer term data from metrics.

Excellent.

To get the above stats, I've added some new panels to the dashboard:

"Outgoing requests by status" in the lower right hand corner reports the percentage which are 200, so this is our "success" rate, and the failure rate is just 100% minus that.

outgoing.png (433×950 px, 32 KB)

Unfortunately this includes some pdfs are "succeeding" because some pdfs return us a 200, so for the above data I've filtered out the pdfs in a hacky way by excluding *.pdf on the request url.

@mvols: 2 questions in response:

  1. What field did you filter out *.pdf from? [i]
  2. With .pdf approximately excluded, are there any other media types that we ought to try to exclude such that the metrics Outgoing requests by status return are specific to URLs? [ii]

And hey! Thank you for sharing how you arrived to these metrics. You doing so helps equip me with the know-how I need to use this dashboard to ask and answer questions of Citoid independently.


i. This is the interface I'm assuming you used to create the filter you described above:

image.png (764×1 px, 109 KB)

ii. One thought: maybe the effort required to do this is not worthwhile considering how minimal we assume the traffic to be for formats like .mov?

isnotpdf.png (332×609 px, 20 KB)

This is the hacky way I was doing this. You could add another one for .mov too, but we probably need to also log an additional separate field for the response code WE give (i.e. 415) to the outgoingReqResults thing, so we have both the responsecode they give us and the response code we give the user.

The 415 thing was deployed today, we can just add logging on top of it.