Page MenuHomePhabricator

Gather reliability data for the wsexport tool
Closed, ResolvedPublic1 Estimated Story Points

Description

This project has come to a wrap. I know we have Uptime robot and probably other checks in place to ensure the tool is more reliable. This ticket is to gather some concrete data for the work we did. This could include: stats for uptime before and uptime now or number of errors we saw previously versus now or anything else.

This data will be published on the project page.

Event Timeline

Niharika triaged this task as Medium priority.Jun 19 2019, 10:59 PM
Niharika created this task.
aezell renamed this task from Gather reliability data for the tool to Gather reliability data for the wsexport tool .Jun 20 2019, 5:05 PM
Niharika changed the point value for this task from 0 to 1.Jun 20 2019, 5:10 PM
Niharika moved this task from Needs Discussion to Up Next (June 3-21) on the Community-Tech board.
MusikAnimal subscribed.

Below I have broken down the data from UptimeRobot, in bimonthly increments:

June 1-June 15

EventDate-TimeReasonDurationDuration (in mins.)
Down2019-06-15 17:14:03Connection Timeout0 hrs, 45 mins45
Down2019-06-13 14:48:43Connection Timeout0 hrs, 1 mins2
Down2019-06-12 19:16:15Connection Timeout0 hrs, 1 mins1
Down2019-06-12 14:18:46Connection Timeout1 hrs, 42 mins103
Down2019-06-12 06:59:58Connection Timeout0 hrs, 3 mins4
Down2019-06-11 13:43:05Connection Timeout0 hrs, 9 mins9
Down2019-06-08 08:35:37Connection Timeout0 hrs, 12 mins12
Down2019-06-03 13:46:27Connection Timeout0 hrs, 2 mins2
Down2019-06-02 16:47:06Connection Timeout0 hrs, 1 mins1

179 minutes downtime

May 16-May 31

EventDate-TimeReasonDurationDuration (in mins.)
Down2019-05-31 15:40:14Connection Timeout0 hrs, 1 mins1
Down2019-05-29 14:11:07Connection Timeout0 hrs, 1 mins1
Down2019-05-29 11:59:53Internal Server Error0 hrs, 24 mins24
Down2019-05-24 20:15:42Connection Timeout0 hrs, 1 mins1
Down2019-05-24 10:02:38Connection Timeout0 hrs, 1 mins1
Down2019-05-24 06:33:36Connection Timeout0 hrs, 1 mins1
Down2019-05-22 16:28:20Connection Timeout0 hrs, 1 mins1
Down2019-05-22 06:42:16Connection Timeout0 hrs, 6 mins6
Down2019-05-19 02:10:23Connection Timeout0 hrs, 10 mins10
Down2019-05-18 20:10:56Connection Timeout0 hrs, 3 mins3
Down2019-05-16 08:33:59Connection Timeout0 hrs, 1 mins1

50 minutes downtime

May 1 - May 15

EventDate-TimeReasonDurationDuration (in mins.)
Down2019-05-15 15:04:12Connection Timeout0 hrs, 28 mins28
Down2019-05-15 12:35:30Connection Timeout0 hrs, 2 mins2
Down2019-05-15 06:21:35Connection Timeout0 hrs, 1 mins1
Down2019-05-15 00:47:25Not Found0 hrs, 5 mins5
Down2019-05-14 16:18:28Connection Timeout0 hrs, 36 mins37
Down2019-05-13 21:36:40Connection Timeout0 hrs, 1 mins1
Down2019-05-13 02:58:01Connection Timeout0 hrs, 4 mins4
Down2019-05-12 20:30:34Connection Timeout0 hrs, 18 mins19
Down2019-05-12 17:56:13Connection Timeout0 hrs, 7 mins8
Down2019-05-12 15:52:55Connection Timeout0 hrs, 26 mins27
Down2019-05-12 11:11:14Connection Timeout0 hrs, 11 mins12
Down2019-05-12 05:57:10Connection Timeout0 hrs, 1 mins1
Down2019-05-11 20:58:14Connection Timeout0 hrs, 1 mins1
Down2019-05-11 15:34:59Connection Timeout0 hrs, 4 mins4
Down2019-05-11 14:23:33Connection Timeout0 hrs, 3 mins4
Down2019-05-11 05:14:02Connection Timeout0 hrs, 1 mins1
Down2019-05-10 15:51:05Connection Timeout0 hrs, 8 mins8
Down2019-05-10 15:42:44Connection Timeout0 hrs, 1 mins1
Down2019-05-10 14:04:20Connection Timeout0 hrs, 1 mins1
Down2019-05-09 17:20:15Connection Timeout0 hrs, 1 mins1
Down2019-05-09 15:01:54Connection Timeout0 hrs, 16 mins16
Down2019-05-09 14:47:28Connection Timeout0 hrs, 2 mins3
Down2019-05-09 11:18:59Connection Timeout0 hrs, 20 mins20
Down2019-05-09 10:25:36Connection Timeout0 hrs, 6 mins6
Down2019-05-09 08:47:11Connection Timeout0 hrs, 1 mins1
Down2019-05-09 04:41:45Connection Timeout0 hrs, 2 mins2
Down2019-05-08 09:48:40Connection Timeout0 hrs, 6 mins7
Down2019-05-07 19:54:54Connection Timeout0 hrs, 3 mins3
Down2019-05-07 06:32:43Connection Timeout0 hrs, 1 mins1
Down2019-05-07 02:01:52Not Found0 hrs, 6 mins6
Down2019-05-06 23:03:22Connection Timeout0 hrs, 1 mins1
Down2019-05-06 20:15:36Connection Timeout0 hrs, 2 mins2
Down2019-05-06 19:04:30Connection Timeout0 hrs, 1 mins1
Down2019-05-06 18:13:17Connection Timeout0 hrs, 23 mins23
Down2019-05-06 17:22:33Connection Timeout0 hrs, 10 mins11
Down2019-05-06 14:22:48Connection Timeout0 hrs, 48 mins49
Down2019-05-06 13:16:25Connection Timeout0 hrs, 19 mins20
Down2019-05-06 13:08:00Connection Timeout0 hrs, 1 mins1
Down2019-05-06 10:50:15Connection Timeout2 hrs, 10 mins131
Down2019-05-06 09:46:47Connection Timeout0 hrs, 5 mins5
Down2019-05-06 07:25:26Connection Timeout1 hrs, 4 mins65
Down2019-05-06 03:25:13Connection Timeout0 hrs, 15 mins15
Down2019-05-05 20:01:59Connection Timeout0 hrs, 8 mins8
Down2019-05-05 18:10:45Connection Timeout0 hrs, 21 mins21
Down2019-05-05 16:50:31Connection Timeout0 hrs, 11 mins12
Down2019-05-05 15:14:16Connection Timeout0 hrs, 39 mins39
Down2019-05-05 14:37:52Connection Timeout0 hrs, 14 mins14
Down2019-05-05 12:57:53Connection Timeout0 hrs, 15 mins15
Down2019-05-05 12:45:26Connection Timeout0 hrs, 5 mins6
Down2019-05-05 12:32:02Connection Timeout0 hrs, 1 mins1
Down2019-05-05 12:16:38Connection Timeout0 hrs, 3 mins3
Down2019-05-05 10:47:15Connection Timeout0 hrs, 4 mins4
Down2019-05-05 07:29:16Connection Timeout0 hrs, 15 mins16
Down2019-05-05 05:43:11Connection Timeout0 hrs, 1 mins1
Down2019-05-04 20:29:46Connection Timeout0 hrs, 7 mins8
Down2019-05-04 19:18:47Connection Timeout0 hrs, 15 mins15
Down2019-05-04 18:19:05Connection Timeout0 hrs, 3 mins4
Down2019-05-04 17:15:50Connection Timeout0 hrs, 7 mins8
Down2019-05-04 13:33:18Connection Timeout0 hrs, 1 mins1
Down2019-05-04 12:41:55Connection Timeout0 hrs, 12 mins12
Down2019-05-04 11:11:24Connection Timeout0 hrs, 1 mins1
Down2019-05-04 09:55:19Connection Timeout0 hrs, 4 mins4
Down2019-05-03 18:12:35Connection Timeout0 hrs, 5 mins5
Down2019-05-03 16:05:34Connection Timeout0 hrs, 2 mins3
Down2019-05-03 15:26:10Connection Timeout0 hrs, 17 mins18
Down2019-05-03 15:17:42Connection Timeout0 hrs, 1 mins1
Down2019-05-03 08:47:21Connection Timeout0 hrs, 34 mins34
Down2019-05-03 08:01:57Connection Timeout0 hrs, 2 mins2
Down2019-05-01 20:03:19Connection Timeout0 hrs, 6 mins7
Down2019-05-01 19:18:58Connection Timeout0 hrs, 3 mins4
Down2019-05-01 17:36:06Connection Timeout0 hrs, 38 mins38
Down2019-05-01 15:47:31Connection Timeout0 hrs, 46 mins47
Down2019-05-01 14:53:17Connection Timeout0 hrs, 3 mins3
Down2019-05-01 12:07:07Connection Timeout0 hrs, 24 mins24
Down2019-05-01 10:49:42Connection Timeout0 hrs, 1 mins2
Down2019-05-01 08:48:18Connection Timeout0 hrs, 4 mins5

941 minutes downtime


We're still not perfect, but as you can see it is considerably better than it used to be. Current uptime, at the time of writing (June 20), is at 98.47% over the past 24 hours, 99.2% over the past week, and 99.42% over the past month.

Thanks @MusikAnimal that's great. Lots of our tools suffer from the 1- to 9-minute outages, so if we leave them out we get the following counts of longer ones:

May 1 – May 1526
May 16 – May 312
June 1 – June 153

Which is good. :)

I've been trying to figure out if we can go back further from before we set up UptimeRobot. I've run the access.log through awstats and these are the visit counts:

Month		Hits	Bandwidth
Jan 2018	47,421	21.25 GB
Feb 2018	47,182	21.70 GB
Mar 2018	46,985	26.27 GB
Apr 2018	47,086	19.99 GB
May 2018	43,318	19.48 GB
Jun 2018	40,456	17.62 GB
Jul 2018	51,929	20.21 GB
Aug 2018	44,517	37.38 GB
Sep 2018	41,581	19.11 GB
Oct 2018	54,756	25.44 GB
Nov 2018	54,341	22.52 GB
Dec 2018	39,691	17.04 GB
Jan 2019	64,419	31.84 GB
Feb 2019	69,244	35.14 GB
Mar 2019	69,184	40.97 GB
Apr 2019	84,636	39.20 GB
May 2019	94,737	48.90 GB

Which shows that at least traffic has increased recently.

But weirdly the 'HTTP Status Codes' report is empty for 2019, whereas it contains this for 2018:

404    Document Not Found            2,981       93.7 %      11.48 MB
500    Internal server Error           130        4 %        11.17 KB
301    Moved permanently (redirect)     64        2 %         0
206    Partial Content                   2        0 %       640.00 KB
400    Bad Request                       2        0 %       698 Bytes

But anyway there seems to be something strange going on with access.log in general; I've opened T226239: access.log is not being written for wsexport.

Oh, I think awstats only generates for completed years and months. So the 50x error counts are as follows, for completed months this year:

Month500502
2019-01130 (4%)0 (0%)
2019-02115 (3.8%)7 (0.2%)
2019-0357 (1.8%)40 (1.2%)
2019-04119 (3.3%)435 (12.3%)
2019-05358 (3.4%)464 (4.5%)

The percentages are of the total number of non-200 responses, excluding favicon 404s. So not super useful. :)

Seems weird that there weren't any 502s in 2018. Maybe things were logged differently?

Thanks @MusikAnimal and @Samwilson. That's good data to share. I'll update the project page. :)