Page MenuHomePhabricator

Errors trying to fetch RDF from Wikidata
Closed, ResolvedPublic

Description

I am getting a lot of errors when trying to fetch RDF from Wikidata:

https://logstash.wikimedia.org/goto/cebf47846908e4cd6135c3127279a4f3

Most of them look like:

Oct 23 00:30:53 wdqs1003 wdqs-updater[20139]: Caused by: org.apache.http.NoHttpResponseException: www.wikidata.org:443 failed to respond

i.e. I am getting no response from Wikidata. RDF generation is supposed to be pretty lightweight, so I wonder why it may be timing out. May be the same issue as T202764 only it's different API here.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Still happening. Anybody from Performance-Team could assist?

So, an interesting thing: in at least some of these cases, there is a web request that is making it to wikidata, and that is returning a 200. I put together a jupyter notebook that pulls down a list of errors from logstash, and then queries the webrequest table in Hive for matches. Looking at a couple of hours of output, there's a mix of requests that actually did have a match and those that did not:

[2018-10-31 21:47:18] [/wiki/Special:EntityData/Q47409824.ttl?nocache=1540842786911&flavor=dump] Error received at 2018-10-29 19:53:06.912000+00:00
[2018-10-31 21:47:18] [/wiki/Special:EntityData/Q47409824.ttl?nocache=1540842786911&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 21:55:50] [/wiki/Special:EntityData/Q47409824.ttl?nocache=1540842786911&flavor=dump] No matching web requests found
[2018-10-31 21:55:50] [/wiki/Special:EntityData/Q57954108.ttl?nocache=1540842620660&flavor=dump] Error received at 2018-10-29 19:50:20.661000+00:00
[2018-10-31 21:55:50] [/wiki/Special:EntityData/Q57954108.ttl?nocache=1540842620660&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 21:57:28] [/wiki/Special:EntityData/Q57954108.ttl?nocache=1540842620660&flavor=dump] No matching web requests found
[2018-10-31 21:57:28] [/wiki/Special:EntityData/Q43203379.ttl?nocache=1540842620660&flavor=dump] Error received at 2018-10-29 19:50:20.661000+00:00
[2018-10-31 21:57:28] [/wiki/Special:EntityData/Q43203379.ttl?nocache=1540842620660&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 21:58:04] [/wiki/Special:EntityData/Q43203379.ttl?nocache=1540842620660&flavor=dump] No matching web requests found
[2018-10-31 21:58:04] [/wiki/Special:EntityData/Q42486692.ttl?nocache=1540842396024&flavor=dump] Error received at 2018-10-29 19:46:36.025000+00:00
[2018-10-31 21:58:04] [/wiki/Special:EntityData/Q42486692.ttl?nocache=1540842396024&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 21:58:40] [/wiki/Special:EntityData/Q42486692.ttl?nocache=1540842396024&flavor=dump] No matching web requests found
[2018-10-31 21:58:40] [/wiki/Special:EntityData/Q33890407.ttl?nocache=1540842396024&flavor=dump] Error received at 2018-10-29 19:46:36.025000+00:00
[2018-10-31 21:58:40] [/wiki/Special:EntityData/Q33890407.ttl?nocache=1540842396024&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 21:59:12] [/wiki/Special:EntityData/Q33890407.ttl?nocache=1540842396024&flavor=dump] No matching web requests found
[2018-10-31 21:59:12] [/wiki/Special:EntityData/Q50092000.ttl?nocache=1540841863246&flavor=dump] Error received at 2018-10-29 19:37:43.247000+00:00
[2018-10-31 21:59:12] [/wiki/Special:EntityData/Q50092000.ttl?nocache=1540841863246&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 21:59:44] [/wiki/Special:EntityData/Q50092000.ttl?nocache=1540841863246&flavor=dump] No matching web requests found
[2018-10-31 21:59:44] [/wiki/Special:EntityData/Q47410266.ttl?nocache=1540841271671&flavor=dump] Error received at 2018-10-29 19:27:51.672000+00:00
[2018-10-31 21:59:44] [/wiki/Special:EntityData/Q47410266.ttl?nocache=1540841271671&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:00:14] [/wiki/Special:EntityData/Q47410266.ttl?nocache=1540841271671&flavor=dump] No matching web requests found
[2018-10-31 22:00:14] [/wiki/Special:EntityData/Q12261543.ttl?nocache=1540841271667&flavor=dump] Error received at 2018-10-29 19:27:51.671000+00:00
[2018-10-31 22:00:14] [/wiki/Special:EntityData/Q12261543.ttl?nocache=1540841271667&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:00:54] [/wiki/Special:EntityData/Q12261543.ttl?nocache=1540841271667&flavor=dump] No matching web requests found
[2018-10-31 22:00:54] [/wiki/Special:EntityData/Q47621848.ttl?nocache=1540841210186&flavor=dump] Error received at 2018-10-29 19:26:50.187000+00:00
[2018-10-31 22:00:54] [/wiki/Special:EntityData/Q47621848.ttl?nocache=1540841210186&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:01:34] [/wiki/Special:EntityData/Q47621848.ttl?nocache=1540841210186&flavor=dump] No matching web requests found
[2018-10-31 22:01:34] [/wiki/Special:EntityData/Q47621848.ttl?nocache=1540841210182&flavor=dump] Error received at 2018-10-29 19:26:50.186000+00:00
[2018-10-31 22:01:34] [/wiki/Special:EntityData/Q47621848.ttl?nocache=1540841210182&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:02:08] [/wiki/Special:EntityData/Q47621848.ttl?nocache=1540841210182&flavor=dump] No matching web requests found
[2018-10-31 22:02:08] [/wiki/Special:EntityData/Q47621848.ttl?nocache=1540841210175&flavor=dump] Error received at 2018-10-29 19:26:50.182000+00:00
[2018-10-31 22:02:08] [/wiki/Special:EntityData/Q47621848.ttl?nocache=1540841210175&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:02:40] [/wiki/Special:EntityData/Q47621848.ttl?nocache=1540841210175&flavor=dump] No matching web requests found
[2018-10-31 22:02:40] [/wiki/Special:EntityData/Q33861879.ttl?nocache=1540841207654&flavor=dump] Error received at 2018-10-29 19:26:47.658000+00:00
[2018-10-31 22:02:40] [/wiki/Special:EntityData/Q33861879.ttl?nocache=1540841207654&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:03:13] [/wiki/Special:EntityData/Q33861879.ttl?nocache=1540841207654&flavor=dump] No matching web requests found
[2018-10-31 22:03:13] [/wiki/Special:EntityData/Q33861879.ttl?nocache=1540841207649&flavor=dump] Error received at 2018-10-29 19:26:47.654000+00:00
[2018-10-31 22:03:13] [/wiki/Special:EntityData/Q33861879.ttl?nocache=1540841207649&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:03:45] [/wiki/Special:EntityData/Q33861879.ttl?nocache=1540841207649&flavor=dump] No matching web requests found
[2018-10-31 22:03:45] [/wiki/Special:EntityData/Q33861879.ttl?nocache=1540841207645&flavor=dump] Error received at 2018-10-29 19:26:47.649000+00:00
[2018-10-31 22:03:45] [/wiki/Special:EntityData/Q33861879.ttl?nocache=1540841207645&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:04:15] [/wiki/Special:EntityData/Q33861879.ttl?nocache=1540841207645&flavor=dump] No matching web requests found
[2018-10-31 22:04:15] [/wiki/Special:EntityData/Q30883701.ttl?nocache=1540841119041&flavor=dump] Error received at 2018-10-29 19:25:19.048000+00:00
[2018-10-31 22:04:15] [/wiki/Special:EntityData/Q30883701.ttl?nocache=1540841119041&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:04:47] [/wiki/Special:EntityData/Q30883701.ttl?nocache=1540841119041&flavor=dump] No matching web requests found
[2018-10-31 22:04:47] [/wiki/Special:EntityData/Q45443276.ttl?nocache=1540841119041&flavor=dump] Error received at 2018-10-29 19:25:19.048000+00:00
[2018-10-31 22:04:47] [/wiki/Special:EntityData/Q45443276.ttl?nocache=1540841119041&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:06:15] [/wiki/Special:EntityData/Q45443276.ttl?nocache=1540841119041&flavor=dump] No matching web requests found
[2018-10-31 22:06:15] [/wiki/Special:EntityData/Q47410598.ttl?nocache=1540841109236&flavor=dump] Error received at 2018-10-29 19:25:09.238000+00:00
[2018-10-31 22:06:15] [/wiki/Special:EntityData/Q47410598.ttl?nocache=1540841109236&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:06:43] [/wiki/Special:EntityData/Q47410598.ttl?nocache=1540841109236&flavor=dump] No matching web requests found
[2018-10-31 22:06:43] [/wiki/Special:EntityData/Q47159887.ttl?nocache=1540841109236&flavor=dump] Error received at 2018-10-29 19:25:09.237000+00:00
[2018-10-31 22:06:43] [/wiki/Special:EntityData/Q47159887.ttl?nocache=1540841109236&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:07:13] [/wiki/Special:EntityData/Q47159887.ttl?nocache=1540841109236&flavor=dump] No matching web requests found
[2018-10-31 22:07:13] [/wiki/Special:EntityData/Q33864648.ttl?nocache=1540841091973&flavor=dump] Error received at 2018-10-29 19:24:51.974000+00:00
[2018-10-31 22:07:13] [/wiki/Special:EntityData/Q33864648.ttl?nocache=1540841091973&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:07:43] [/wiki/Special:EntityData/Q33864648.ttl?nocache=1540841091973&flavor=dump] No matching web requests found
[2018-10-31 22:07:43] [/wiki/Special:EntityData/Q45443372.ttl?nocache=1540840969393&flavor=dump] Error received at 2018-10-29 19:22:49.404000+00:00
[2018-10-31 22:07:43] [/wiki/Special:EntityData/Q45443372.ttl?nocache=1540840969393&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:08:29] [/wiki/Special:EntityData/Q45443372.ttl?nocache=1540840969393&flavor=dump] No matching web requests found
[2018-10-31 22:08:29] [/wiki/Special:EntityData/Q47410370.ttl?nocache=1540840969393&flavor=dump] Error received at 2018-10-29 19:22:49.404000+00:00
[2018-10-31 22:08:29] [/wiki/Special:EntityData/Q47410370.ttl?nocache=1540840969393&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:09:25] [/wiki/Special:EntityData/Q47410370.ttl?nocache=1540840969393&flavor=dump] No matching web requests found
[2018-10-31 22:09:25] [/wiki/Special:EntityData/Q57933018.ttl?nocache=1540840747322&flavor=dump] Error received at 2018-10-29 19:19:07.331000+00:00
[2018-10-31 22:09:25] [/wiki/Special:EntityData/Q57933018.ttl?nocache=1540840747322&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:09:59] [/wiki/Special:EntityData/Q57933018.ttl?nocache=1540840747322&flavor=dump] No matching web requests found
[2018-10-31 22:09:59] [/wiki/Special:EntityData/Q45443788.ttl?nocache=1540840747322&flavor=dump] Error received at 2018-10-29 19:19:07.331000+00:00
[2018-10-31 22:09:59] [/wiki/Special:EntityData/Q45443788.ttl?nocache=1540840747322&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:10:33] [/wiki/Special:EntityData/Q45443788.ttl?nocache=1540840747322&flavor=dump] No matching web requests found
[2018-10-31 22:10:33] [/wiki/Special:EntityData/Q46277938.ttl?nocache=1540840540656&flavor=dump] Error received at 2018-10-29 19:15:40.657000+00:00
[2018-10-31 22:10:33] [/wiki/Special:EntityData/Q46277938.ttl?nocache=1540840540656&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:11:10] [/wiki/Special:EntityData/Q46277938.ttl?nocache=1540840540656&flavor=dump] No matching web requests found
[2018-10-31 22:11:10] [/wiki/Special:EntityData/Q47843050.ttl?nocache=1540840540656&flavor=dump] Error received at 2018-10-29 19:15:40.657000+00:00
[2018-10-31 22:11:10] [/wiki/Special:EntityData/Q47843050.ttl?nocache=1540840540656&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:11:46] [/wiki/Special:EntityData/Q47843050.ttl?nocache=1540840540656&flavor=dump] No matching web requests found
[2018-10-31 22:11:46] [/wiki/Special:EntityData/Q57932869.ttl?nocache=1540840384686&flavor=dump] Error received at 2018-10-29 19:13:04.688000+00:00
[2018-10-31 22:11:46] [/wiki/Special:EntityData/Q57932869.ttl?nocache=1540840384686&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:12:16] [/wiki/Special:EntityData/Q57932869.ttl?nocache=1540840384686&flavor=dump] No matching web requests found
[2018-10-31 22:12:16] [/wiki/Special:EntityData/Q46388662.ttl?nocache=1540840384686&flavor=dump] Error received at 2018-10-29 19:13:04.687000+00:00
[2018-10-31 22:12:16] [/wiki/Special:EntityData/Q46388662.ttl?nocache=1540840384686&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:12:46] [/wiki/Special:EntityData/Q46388662.ttl?nocache=1540840384686&flavor=dump] No matching web requests found
[2018-10-31 22:12:46] [/wiki/Special:EntityData/Q45444559.ttl?nocache=1540839649508&flavor=dump] Error received at 2018-10-29 19:00:49.509000+00:00
[2018-10-31 22:12:46] [/wiki/Special:EntityData/Q45444559.ttl?nocache=1540839649508&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:13:30] [/wiki/Special:EntityData/Q45444559.ttl?nocache=1540839649508&flavor=dump] No matching web requests found
[2018-10-31 22:13:30] [/wiki/Special:EntityData/Q2573904.ttl?nocache=1540839649508&flavor=dump] Error received at 2018-10-29 19:00:49.509000+00:00
[2018-10-31 22:13:30] [/wiki/Special:EntityData/Q2573904.ttl?nocache=1540839649508&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:14:18] [/wiki/Special:EntityData/Q2573904.ttl?nocache=1540839649508&flavor=dump] No matching web requests found
[2018-10-31 22:14:18] [/wiki/Special:EntityData/Q3621409.ttl?nocache=1540839224497&flavor=dump] Error received at 2018-10-29 18:53:44.501000+00:00
[2018-10-31 22:14:18] [/wiki/Special:EntityData/Q3621409.ttl?nocache=1540839224497&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:14:44] [/wiki/Special:EntityData/Q3621409.ttl?nocache=1540839224497&flavor=dump] No matching web requests found
[2018-10-31 22:14:44] [/wiki/Special:EntityData/Q42236722.ttl?nocache=1540839224497&flavor=dump] Error received at 2018-10-29 18:53:44.501000+00:00
[2018-10-31 22:14:44] [/wiki/Special:EntityData/Q42236722.ttl?nocache=1540839224497&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:15:10] [/wiki/Special:EntityData/Q42236722.ttl?nocache=1540839224497&flavor=dump] No matching web requests found
[2018-10-31 22:15:10] [/wiki/Special:EntityData/Q39143621.ttl?nocache=1540839215434&flavor=dump] Error received at 2018-10-29 18:53:35.446000+00:00
[2018-10-31 22:15:10] [/wiki/Special:EntityData/Q39143621.ttl?nocache=1540839215434&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:15:46] [/wiki/Special:EntityData/Q39143621.ttl?nocache=1540839215434&flavor=dump] No matching web requests found
[2018-10-31 22:15:46] [/wiki/Special:EntityData/Q13357190.ttl?nocache=1540839084793&flavor=dump] Error received at 2018-10-29 18:51:24.801000+00:00
[2018-10-31 22:15:46] [/wiki/Special:EntityData/Q13357190.ttl?nocache=1540839084793&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:16:22] [/wiki/Special:EntityData/Q13357190.ttl?nocache=1540839084793&flavor=dump] No matching web requests found
[2018-10-31 22:16:22] [/wiki/Special:EntityData/Q8375902.ttl?nocache=1540839084793&flavor=dump] Error received at 2018-10-29 18:51:24.801000+00:00
[2018-10-31 22:16:22] [/wiki/Special:EntityData/Q8375902.ttl?nocache=1540839084793&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:16:56] [/wiki/Special:EntityData/Q8375902.ttl?nocache=1540839084793&flavor=dump] No matching web requests found
[2018-10-31 22:16:56] [/wiki/Special:EntityData/Q42459160.ttl?nocache=1540837422934&flavor=dump] Error received at 2018-10-29 18:23:42.935000+00:00
[2018-10-31 22:16:56] [/wiki/Special:EntityData/Q42459160.ttl?nocache=1540837422934&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:17:43] [/wiki/Special:EntityData/Q42459160.ttl?nocache=1540837422934&flavor=dump] No matching web requests found
[2018-10-31 22:17:43] [/wiki/Special:EntityData/Q42459160.ttl?nocache=1540837422935&flavor=dump] Error received at 2018-10-29 18:23:42.935000+00:00
[2018-10-31 22:17:43] [/wiki/Special:EntityData/Q42459160.ttl?nocache=1540837422935&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:18:27] [/wiki/Special:EntityData/Q42459160.ttl?nocache=1540837422935&flavor=dump] No matching web requests found
[2018-10-31 22:18:27] [/wiki/Special:EntityData/Q42459160.ttl?nocache=1540837422929&flavor=dump] Error received at 2018-10-29 18:23:42.933000+00:00
[2018-10-31 22:18:27] [/wiki/Special:EntityData/Q42459160.ttl?nocache=1540837422929&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:19:11] [/wiki/Special:EntityData/Q42459160.ttl?nocache=1540837422929&flavor=dump] No matching web requests found
[2018-10-31 22:19:11] [/wiki/Special:EntityData/Q6905681.ttl?nocache=1540837170626&flavor=dump] Error received at 2018-10-29 18:19:30.627000+00:00
[2018-10-31 22:19:11] [/wiki/Special:EntityData/Q6905681.ttl?nocache=1540837170626&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:19:49] [/wiki/Special:EntityData/Q6905681.ttl?nocache=1540837170626&flavor=dump] No matching web requests found
[2018-10-31 22:19:49] [/wiki/Special:EntityData/Q6905681.ttl?nocache=1540837170625&flavor=dump] Error received at 2018-10-29 18:19:30.626000+00:00
[2018-10-31 22:19:49] [/wiki/Special:EntityData/Q6905681.ttl?nocache=1540837170625&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:20:27] [/wiki/Special:EntityData/Q6905681.ttl?nocache=1540837170625&flavor=dump] No matching web requests found
[2018-10-31 22:20:27] [/wiki/Special:EntityData/Q33872363.ttl?nocache=1540837032063&flavor=dump] Error received at 2018-10-29 18:17:12.064000+00:00
[2018-10-31 22:20:27] [/wiki/Special:EntityData/Q33872363.ttl?nocache=1540837032063&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:21:05] [/wiki/Special:EntityData/Q33872363.ttl?nocache=1540837032063&flavor=dump] No matching web requests found
[2018-10-31 22:21:05] [/wiki/Special:EntityData/Q57953232.ttl?nocache=1540837032063&flavor=dump] Error received at 2018-10-29 18:17:12.064000+00:00
[2018-10-31 22:21:05] [/wiki/Special:EntityData/Q57953232.ttl?nocache=1540837032063&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:21:39] [/wiki/Special:EntityData/Q57953232.ttl?nocache=1540837032063&flavor=dump] No matching web requests found
[2018-10-31 22:21:39] [/wiki/Special:EntityData/Q48348733.ttl?nocache=1540837032063&flavor=dump] Error received at 2018-10-29 18:17:12.064000+00:00
[2018-10-31 22:21:39] [/wiki/Special:EntityData/Q48348733.ttl?nocache=1540837032063&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:22:05] [/wiki/Special:EntityData/Q48348733.ttl?nocache=1540837032063&flavor=dump] No matching web requests found
[2018-10-31 22:22:05] [/wiki/Special:EntityData/Q33874511.ttl?nocache=776740189&flavor=dump] Error received at 2018-10-29 18:10:46.377000+00:00
[2018-10-31 22:22:05] [/wiki/Special:EntityData/Q33874511.ttl?nocache=776740189&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:22:31] [/wiki/Special:EntityData/Q33874511.ttl?nocache=776740189&flavor=dump] 1 matching requests found
[2018-10-31 22:22:31] [/wiki/Special:EntityData/Q33874323.ttl?nocache=776740467&flavor=dump] Error received at 2018-10-29 18:10:46.377000+00:00
[2018-10-31 22:22:31] [/wiki/Special:EntityData/Q33874323.ttl?nocache=776740467&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:23:00] [/wiki/Special:EntityData/Q33874323.ttl?nocache=776740467&flavor=dump] 1 matching requests found
[2018-10-31 22:23:00] [/wiki/Special:EntityData/Q46391626.ttl?nocache=776740838&flavor=dump] Error received at 2018-10-29 18:10:46.377000+00:00
[2018-10-31 22:23:00] [/wiki/Special:EntityData/Q46391626.ttl?nocache=776740838&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:23:26] [/wiki/Special:EntityData/Q46391626.ttl?nocache=776740838&flavor=dump] 1 matching requests found
[2018-10-31 22:23:26] [/wiki/Special:EntityData/Q46391496.ttl?nocache=1540836643970&flavor=dump] Error received at 2018-10-29 18:10:43.971000+00:00
[2018-10-31 22:23:26] [/wiki/Special:EntityData/Q46391496.ttl?nocache=1540836643970&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:23:52] [/wiki/Special:EntityData/Q46391496.ttl?nocache=1540836643970&flavor=dump] No matching web requests found
[2018-10-31 22:23:52] [/wiki/Special:EntityData/Q46391496.ttl?nocache=1540836643971&flavor=dump] Error received at 2018-10-29 18:10:43.971000+00:00
[2018-10-31 22:23:52] [/wiki/Special:EntityData/Q46391496.ttl?nocache=1540836643971&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:24:18] [/wiki/Special:EntityData/Q46391496.ttl?nocache=1540836643971&flavor=dump] 1 matching requests found
[2018-10-31 22:24:18] [/wiki/Special:EntityData/Q46391496.ttl?nocache=1540836643969&flavor=dump] Error received at 2018-10-29 18:10:43.970000+00:00
[2018-10-31 22:24:18] [/wiki/Special:EntityData/Q46391496.ttl?nocache=1540836643969&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:24:44] [/wiki/Special:EntityData/Q46391496.ttl?nocache=1540836643969&flavor=dump] No matching web requests found
[2018-10-31 22:24:44] [/wiki/Special:EntityData/Q33873878.ttl?nocache=1540836531431&flavor=dump] Error received at 2018-10-29 18:08:51.432000+00:00
[2018-10-31 22:24:44] [/wiki/Special:EntityData/Q33873878.ttl?nocache=1540836531431&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:25:08] [/wiki/Special:EntityData/Q33873878.ttl?nocache=1540836531431&flavor=dump] No matching web requests found
[2018-10-31 22:25:08] [/wiki/Special:EntityData/Q42577928.ttl?nocache=1540836431425&flavor=dump] Error received at 2018-10-29 18:07:11.426000+00:00
[2018-10-31 22:25:08] [/wiki/Special:EntityData/Q42577928.ttl?nocache=1540836431425&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:25:32] [/wiki/Special:EntityData/Q42577928.ttl?nocache=1540836431425&flavor=dump] No matching web requests found
[2018-10-31 22:25:32] [/wiki/Special:EntityData/Q57952786.ttl?nocache=1540836431425&flavor=dump] Error received at 2018-10-29 18:07:11.426000+00:00
[2018-10-31 22:25:32] [/wiki/Special:EntityData/Q57952786.ttl?nocache=1540836431425&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:25:58] [/wiki/Special:EntityData/Q57952786.ttl?nocache=1540836431425&flavor=dump] No matching web requests found
[2018-10-31 22:25:58] [/wiki/Special:EntityData/Q49253890.ttl?nocache=1540835449543&flavor=dump] Error received at 2018-10-29 17:50:49.544000+00:00
[2018-10-31 22:25:58] [/wiki/Special:EntityData/Q49253890.ttl?nocache=1540835449543&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:33:18] [/wiki/Special:EntityData/Q49253890.ttl?nocache=1540835449543&flavor=dump] No matching web requests found
[2018-10-31 22:33:18] [/wiki/Special:EntityData/Q47414661.ttl?nocache=1540835449544&flavor=dump] Error received at 2018-10-29 17:50:49.544000+00:00
[2018-10-31 22:33:18] [/wiki/Special:EntityData/Q47414661.ttl?nocache=1540835449544&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:33:54] [/wiki/Special:EntityData/Q47414661.ttl?nocache=1540835449544&flavor=dump] 1 matching requests found
[2018-10-31 22:33:54] [/wiki/Special:EntityData/Q42208879.ttl?nocache=1540835449544&flavor=dump] Error received at 2018-10-29 17:50:49.544000+00:00
[2018-10-31 22:33:54] [/wiki/Special:EntityData/Q42208879.ttl?nocache=1540835449544&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:34:33] [/wiki/Special:EntityData/Q42208879.ttl?nocache=1540835449544&flavor=dump] 1 matching requests found
[2018-10-31 22:34:33] [/wiki/Special:EntityData/Q49254016.ttl?nocache=776727015&flavor=dump] Error received at 2018-10-29 17:49:56.981000+00:00
[2018-10-31 22:34:33] [/wiki/Special:EntityData/Q49254016.ttl?nocache=776727015&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:35:13] [/wiki/Special:EntityData/Q49254016.ttl?nocache=776727015&flavor=dump] 1 matching requests found
[2018-10-31 22:35:13] [/wiki/Special:EntityData/Q33877177.ttl?nocache=776726538&flavor=dump] Error received at 2018-10-29 17:49:56.977000+00:00
[2018-10-31 22:35:13] [/wiki/Special:EntityData/Q33877177.ttl?nocache=776726538&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:35:57] [/wiki/Special:EntityData/Q33877177.ttl?nocache=776726538&flavor=dump] 1 matching requests found
[2018-10-31 22:35:57] [/wiki/Special:EntityData/Q33223556.ttl?nocache=776727241&flavor=dump] Error received at 2018-10-29 17:49:56.977000+00:00
[2018-10-31 22:35:57] [/wiki/Special:EntityData/Q33223556.ttl?nocache=776727241&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:36:35] [/wiki/Special:EntityData/Q33223556.ttl?nocache=776727241&flavor=dump] 1 matching requests found
[2018-10-31 22:36:35] [/wiki/Special:EntityData/Q25653026.ttl?nocache=1540835298285&flavor=dump] Error received at 2018-10-29 17:48:18.286000+00:00
[2018-10-31 22:36:35] [/wiki/Special:EntityData/Q25653026.ttl?nocache=1540835298285&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:37:32] [/wiki/Special:EntityData/Q25653026.ttl?nocache=1540835298285&flavor=dump] No matching web requests found
[2018-10-31 22:37:32] [/wiki/Special:EntityData/Q42557460.ttl?nocache=1540835298285&flavor=dump] Error received at 2018-10-29 17:48:18.286000+00:00
[2018-10-31 22:37:32] [/wiki/Special:EntityData/Q42557460.ttl?nocache=1540835298285&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:38:00] [/wiki/Special:EntityData/Q42557460.ttl?nocache=1540835298285&flavor=dump] No matching web requests found
[2018-10-31 22:38:00] [/wiki/Special:EntityData/Q45447710.ttl?nocache=1540834899150&flavor=dump] Error received at 2018-10-29 17:41:39.151000+00:00
[2018-10-31 22:38:00] [/wiki/Special:EntityData/Q45447710.ttl?nocache=1540834899150&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:38:28] [/wiki/Special:EntityData/Q45447710.ttl?nocache=1540834899150&flavor=dump] No matching web requests found
[2018-10-31 22:38:28] [/wiki/Special:EntityData/Q38603532.ttl?nocache=1540834526976&flavor=dump] Error received at 2018-10-29 17:35:26.977000+00:00
[2018-10-31 22:38:28] [/wiki/Special:EntityData/Q38603532.ttl?nocache=1540834526976&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:38:56] [/wiki/Special:EntityData/Q38603532.ttl?nocache=1540834526976&flavor=dump] No matching web requests found
[2018-10-31 22:38:56] [/wiki/Special:EntityData/Q42175937.ttl?nocache=1540834526976&flavor=dump] Error received at 2018-10-29 17:35:26.977000+00:00
[2018-10-31 22:38:56] [/wiki/Special:EntityData/Q42175937.ttl?nocache=1540834526976&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:39:24] [/wiki/Special:EntityData/Q42175937.ttl?nocache=1540834526976&flavor=dump] No matching web requests found
[2018-10-31 22:39:24] [/wiki/Special:EntityData/Q13236321.ttl?nocache=1540834339416&flavor=dump] Error received at 2018-10-29 17:32:19.417000+00:00
[2018-10-31 22:39:24] [/wiki/Special:EntityData/Q13236321.ttl?nocache=1540834339416&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:39:52] [/wiki/Special:EntityData/Q13236321.ttl?nocache=1540834339416&flavor=dump] No matching web requests found
[2018-10-31 22:39:52] [/wiki/Special:EntityData/Q45588090.ttl?nocache=1540834339416&flavor=dump] Error received at 2018-10-29 17:32:19.417000+00:00
[2018-10-31 22:39:52] [/wiki/Special:EntityData/Q45588090.ttl?nocache=1540834339416&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:40:46] [/wiki/Special:EntityData/Q45588090.ttl?nocache=1540834339416&flavor=dump] No matching web requests found
[2018-10-31 22:40:46] [/wiki/Special:EntityData/Q45448015.ttl?nocache=1540834339416&flavor=dump] Error received at 2018-10-29 17:32:19.417000+00:00
[2018-10-31 22:40:46] [/wiki/Special:EntityData/Q45448015.ttl?nocache=1540834339416&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:42:10] [/wiki/Special:EntityData/Q45448015.ttl?nocache=1540834339416&flavor=dump] No matching web requests found
[2018-10-31 22:42:10] [/wiki/Special:EntityData/Q39309886.ttl?nocache=1540834314181&flavor=dump] Error received at 2018-10-29 17:31:54.182000+00:00
[2018-10-31 22:42:10] [/wiki/Special:EntityData/Q39309886.ttl?nocache=1540834314181&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:43:38] [/wiki/Special:EntityData/Q39309886.ttl?nocache=1540834314181&flavor=dump] No matching web requests found
[2018-10-31 22:43:38] [/wiki/Special:EntityData/Q45448077.ttl?nocache=1540834314181&flavor=dump] Error received at 2018-10-29 17:31:54.182000+00:00
[2018-10-31 22:43:38] [/wiki/Special:EntityData/Q45448077.ttl?nocache=1540834314181&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:45:06] [/wiki/Special:EntityData/Q45448077.ttl?nocache=1540834314181&flavor=dump] No matching web requests found
[2018-10-31 22:45:06] [/wiki/Special:EntityData/Q56269637.ttl?nocache=1540834294825&flavor=dump] Error received at 2018-10-29 17:31:34.825000+00:00
[2018-10-31 22:45:06] [/wiki/Special:EntityData/Q56269637.ttl?nocache=1540834294825&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:46:33] [/wiki/Special:EntityData/Q56269637.ttl?nocache=1540834294825&flavor=dump] 1 matching requests found
[2018-10-31 22:46:33] [/wiki/Special:EntityData/Q47114514.ttl?nocache=1540834294825&flavor=dump] Error received at 2018-10-29 17:31:34.825000+00:00
[2018-10-31 22:46:33] [/wiki/Special:EntityData/Q47114514.ttl?nocache=1540834294825&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:47:03] [/wiki/Special:EntityData/Q47114514.ttl?nocache=1540834294825&flavor=dump] 1 matching requests found
[2018-10-31 22:47:03] [/wiki/Special:EntityData/Q33879438.ttl?nocache=1540834294825&flavor=dump] Error received at 2018-10-29 17:31:34.825000+00:00
[2018-10-31 22:47:03] [/wiki/Special:EntityData/Q33879438.ttl?nocache=1540834294825&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:47:59] [/wiki/Special:EntityData/Q33879438.ttl?nocache=1540834294825&flavor=dump] 1 matching requests found
[2018-10-31 22:47:59] [/wiki/Special:EntityData/Q46393243.ttl?nocache=1540834215605&flavor=dump] Error received at 2018-10-29 17:30:15.606000+00:00
[2018-10-31 22:47:59] [/wiki/Special:EntityData/Q46393243.ttl?nocache=1540834215605&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:48:51] [/wiki/Special:EntityData/Q46393243.ttl?nocache=1540834215605&flavor=dump] No matching web requests found
[2018-10-31 22:48:51] [/wiki/Special:EntityData/Q47415571.ttl?nocache=1540834215605&flavor=dump] Error received at 2018-10-29 17:30:15.606000+00:00
[2018-10-31 22:48:51] [/wiki/Special:EntityData/Q47415571.ttl?nocache=1540834215605&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:49:23] [/wiki/Special:EntityData/Q47415571.ttl?nocache=1540834215605&flavor=dump] No matching web requests found
[2018-10-31 22:49:23] [/wiki/Special:EntityData/Q56269804.ttl?nocache=1540834076592&flavor=dump] Error received at 2018-10-29 17:27:56.593000+00:00
[2018-10-31 22:49:23] [/wiki/Special:EntityData/Q56269804.ttl?nocache=1540834076592&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:49:53] [/wiki/Special:EntityData/Q56269804.ttl?nocache=1540834076592&flavor=dump] No matching web requests found
[2018-10-31 22:49:53] [/wiki/Special:EntityData/Q538426.ttl?nocache=1540834076592&flavor=dump] Error received at 2018-10-29 17:27:56.593000+00:00
[2018-10-31 22:49:53] [/wiki/Special:EntityData/Q538426.ttl?nocache=1540834076592&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:50:21] [/wiki/Special:EntityData/Q538426.ttl?nocache=1540834076592&flavor=dump] No matching web requests found
[2018-10-31 22:50:21] [/wiki/Special:EntityData/Q43141198.ttl?nocache=1540834076420&flavor=dump] Error received at 2018-10-29 17:27:56.420000+00:00
[2018-10-31 22:50:21] [/wiki/Special:EntityData/Q43141198.ttl?nocache=1540834076420&flavor=dump] executing Hive query looking for a matching web request
[2018-10-31 22:50:50] [/wiki/Special:EntityData/Q43141198.ttl?nocache=1540834076420&flavor=dump] 1 matching requests found

At a bare minimum, this makes me think that there may be multiple issues going on here. It also makes me wonder whether the error that the HTTPClient is throwing might be more generic than it appears from the exception text.

Actually, as I think about it, the lack of a matching webrequest is itself a potentially interesting clue. Varnish should have recorded the request even if there were timeouts from the backend, but there's no indication of that.

@BBlack @ema Question: Do varnish error logs get aggregated in to logstash? If not, how do I access them?

Given what we're seeing so far, I suggest that we let https://gerrit.wikimedia.org/r/471737 land and see if that changes behavior. It seems reasonable to me that it might be the culprit.

Doesn't appear to have solved the issue, but I need to verify that the patches have actually been deployed: https://logstash.wikimedia.org/goto/d72dcf3ef04eb8d02cb6a4c602754cfd

The patch has been deployed, and doesn't look like it prevents the issue:

18:05:29.346 [update 4] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
18:05:29.346 [update 7] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
18:08:45.046 [update 6] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
18:08:45.046 [update 0] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
19:56:44.648 [update 1] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
19:59:42.577 [update 0] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
19:59:42.577 [update 2] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:02:22.876 [update 1] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:02:22.877 [update 6] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:11:03.865 [update 0] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:11:03.865 [update 5] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:13:31.777 [update 5] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:14:14.867 [update 2] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:14:39.835 [update 4] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:15:16.837 [update 5] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:15:37.914 [update 2] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:15:38.577 [update 9] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:16:07.461 [update 4] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:16:07.505 [update 6] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:16:07.709 [update 3] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:16:51.627 [update 4] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:16:52.950 [update 0] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:16:54.337 [update 8] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:17:47.385 [update 7] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:17:49.781 [update 5] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:17:50.905 [update 1] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:17:51.889 [update 6] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:17:53.768 [update 3] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
20:48:04.277 [update 9] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
21:04:36.796 [update 3] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.
21:18:20.854 [update 0] WARN  org.wikidata.query.rdf.tool.Updater - Retryable error syncing.  Retrying.

This is just for 4 hours. So it happens constantly (and looks like it also happens in batches).

So, an interesting thing: in at least some of these cases, there is a web request that is making it to wikidata, and that is returning a 200.

The request is retried if it fails, are you sure it's not the retry that you are seeing with 200?

So, an interesting thing: in at least some of these cases, there is a web request that is making it to wikidata, and that is returning a 200.

The request is retried if it fails, are you sure it's not the retry that you are seeing with 200?

It could be -- how quickly does it retry? Immediately? Or is there a delay?

It could be -- how quickly does it retry? Immediately? Or is there a delay?

I don't think there's a delay for NoHttpResponseException. There's 500ms delay for 503 and 429, but for other exceptions I think it retries immediately.

https://phabricator.wikimedia.org/diffusion/WDQR/browse/master/tools/src/main/java/org/wikidata/query/rdf/tool/wikibase/WikibaseRepository.java$566

Hrm. In that case, very likely that you're right, and what I'm seeing is
the retry.

Out of curiosity, have you examined GC behavior around the times that these
issues crop up?

@Imarlier nothing special in GC that can be linked to the errors. GC times seem to be low and unexceptional.

@BBlack @ema Couple of questions for you about Nginx:

  • Do we have nginx configured to handle a specific number of requests on a given worker process/thread, and then shut that down?
  • If it possible for nginx to be restarted (interrupting existing persistent connections) due to config updates or the like, and if so, is there a record of times when that has happened?

What I'm poking at is the issue described here: https://williamsbdev.com/posts/no-http-response-exceptions/.

The Wikidata updater process has a pool of connections to the text cache nodes, eg

imarlier@wdqs1003:~$ sudo netstat -anet | grep 208.80.154.224
tcp6       0      0 10.64.0.14:50536        208.80.154.224:443      ESTABLISHED 499        283749063
tcp6       0      0 10.64.0.14:50278        208.80.154.224:443      ESTABLISHED 499        283748769
tcp6       0      0 10.64.0.14:51154        208.80.154.224:443      ESTABLISHED 499        283750591
tcp6       0      0 10.64.0.14:50566        208.80.154.224:443      ESTABLISHED 499        283733664
tcp6       0      0 10.64.0.14:50564        208.80.154.224:443      ESTABLISHED 499        283749076
tcp6       0      0 10.64.0.14:51184        208.80.154.224:443      ESTABLISHED 499        283738772
tcp6       0      0 10.64.0.14:51648        208.80.154.224:443      ESTABLISHED 499        283750982
tcp6       0      0 10.64.0.14:50794        208.80.154.224:443      ESTABLISHED 499        283749269
tcp6       0      0 10.64.0.14:51662        208.80.154.224:443      ESTABLISHED 499        283750987
tcp6       0      0 10.64.0.14:50796        208.80.154.224:443      ESTABLISHED 499        283738743

The implication of the article linked above is that if the process that represents the remote side of these connections were to shut down while the connection itself is idle, it's possible that the socket would enter a CLOSE_WAIT state on the local side, and that in turn would result in the NoHttpResponseException that we're seeing in this case.

I'm doing some super super naive testing of this hypothesis on one of the wdqs hosts, but I'm curious whether it strikes you as being possible based on what we see here.

@Smalyshev Another thought: why not just disable pooling, and have the client close each connection after each request?

@Imarlier wouldn't that be slower? But I could try that too I guess.

@Smalyshev Yes, it would be slower, but it would also be diagnostic -- if persistent connections are disabled and the errors stop, we can be pretty confident that something about the way that they're configured is what's resulting in this issue.

(It also shouldn't be that much slower. Connection setup/teardown is a pretty small portion of the request time for the types of requests that are being sent on this channel.

I've been running this in a tmux session on a few of the wdqs servers: while :; do DSTAMP=$(date); CW=$(sudo netstat -anet | grep 208.80.154.224 | grep -c CLOSE_WAIT); echo "${DSTAMP}: ${CW}"; sleep 1; done >> ~/close_waits.txt. (154.224 is the edge for the text cache cluster.)

It's a very small sample at this point, but there are a few things that make me think that this is the issue, or one of the issues:

wdqs1006 had three errors thrown at 2018-12-03T20:09:56. This is the close_wait count around that time:

Mon Dec  3 20:09:30 UTC 2018: 0
Mon Dec  3 20:09:31 UTC 2018: 0
Mon Dec  3 20:09:32 UTC 2018: 0
Mon Dec  3 20:09:33 UTC 2018: 0
Mon Dec  3 20:09:34 UTC 2018: 0
Mon Dec  3 20:09:35 UTC 2018: 0
Mon Dec  3 20:09:36 UTC 2018: 0
Mon Dec  3 20:09:37 UTC 2018: 0
Mon Dec  3 20:09:38 UTC 2018: 10
Mon Dec  3 20:09:39 UTC 2018: 10
Mon Dec  3 20:09:40 UTC 2018: 10
Mon Dec  3 20:09:41 UTC 2018: 10
Mon Dec  3 20:09:42 UTC 2018: 10
Mon Dec  3 20:09:43 UTC 2018: 10
Mon Dec  3 20:09:44 UTC 2018: 10
Mon Dec  3 20:09:45 UTC 2018: 10
Mon Dec  3 20:09:46 UTC 2018: 10
Mon Dec  3 20:09:47 UTC 2018: 10
Mon Dec  3 20:09:48 UTC 2018: 10
Mon Dec  3 20:09:49 UTC 2018: 10
Mon Dec  3 20:09:50 UTC 2018: 10
Mon Dec  3 20:09:51 UTC 2018: 10
Mon Dec  3 20:09:52 UTC 2018: 10
Mon Dec  3 20:09:53 UTC 2018: 10
Mon Dec  3 20:09:54 UTC 2018: 10
Mon Dec  3 20:09:55 UTC 2018: 10
Mon Dec  3 20:09:57 UTC 2018: 0
Mon Dec  3 20:09:58 UTC 2018: 0
Mon Dec  3 20:09:59 UTC 2018: 0
Mon Dec  3 20:10:00 UTC 2018: 0

wdqs1004 had three exceptions thrown at 20:08:23. This is the close_wait count around that time:

Mon Dec  3 20:08:10 UTC 2018: 0
Mon Dec  3 20:08:11 UTC 2018: 0
Mon Dec  3 20:08:12 UTC 2018: 0
Mon Dec  3 20:08:13 UTC 2018: 0
Mon Dec  3 20:08:14 UTC 2018: 0
Mon Dec  3 20:08:15 UTC 2018: 10
Mon Dec  3 20:08:16 UTC 2018: 10
Mon Dec  3 20:08:17 UTC 2018: 10
Mon Dec  3 20:08:18 UTC 2018: 10
Mon Dec  3 20:08:19 UTC 2018: 10
Mon Dec  3 20:08:20 UTC 2018: 10
Mon Dec  3 20:08:21 UTC 2018: 10
Mon Dec  3 20:08:22 UTC 2018: 10
Mon Dec  3 20:08:23 UTC 2018: 0
Mon Dec  3 20:08:24 UTC 2018: 0
Mon Dec  3 20:08:25 UTC 2018: 0
Mon Dec  3 20:08:26 UTC 2018: 0
Mon Dec  3 20:08:27 UTC 2018: 0
Mon Dec  3 20:08:28 UTC 2018: 0
Mon Dec  3 20:08:29 UTC 2018: 0

wdqs1005 had three exceptions thrown at 20:07:51. This is the close_wait count around that time:

Mon Dec  3 20:07:20 UTC 2018: 0
Mon Dec  3 20:07:21 UTC 2018: 0
Mon Dec  3 20:07:22 UTC 2018: 10
Mon Dec  3 20:07:23 UTC 2018: 10
Mon Dec  3 20:07:24 UTC 2018: 10
Mon Dec  3 20:07:25 UTC 2018: 10
Mon Dec  3 20:07:26 UTC 2018: 10
Mon Dec  3 20:07:27 UTC 2018: 10
Mon Dec  3 20:07:28 UTC 2018: 10
Mon Dec  3 20:07:29 UTC 2018: 10
Mon Dec  3 20:07:30 UTC 2018: 10
Mon Dec  3 20:07:31 UTC 2018: 10
Mon Dec  3 20:07:32 UTC 2018: 10
Mon Dec  3 20:07:33 UTC 2018: 10
Mon Dec  3 20:07:35 UTC 2018: 10
Mon Dec  3 20:07:36 UTC 2018: 10
Mon Dec  3 20:07:37 UTC 2018: 10
Mon Dec  3 20:07:38 UTC 2018: 10
Mon Dec  3 20:07:39 UTC 2018: 10
Mon Dec  3 20:07:40 UTC 2018: 10
Mon Dec  3 20:07:41 UTC 2018: 10
Mon Dec  3 20:07:42 UTC 2018: 10
Mon Dec  3 20:07:43 UTC 2018: 10
Mon Dec  3 20:07:44 UTC 2018: 10
Mon Dec  3 20:07:45 UTC 2018: 10
Mon Dec  3 20:07:46 UTC 2018: 10
Mon Dec  3 20:07:47 UTC 2018: 10
Mon Dec  3 20:07:48 UTC 2018: 10
Mon Dec  3 20:07:49 UTC 2018: 10
Mon Dec  3 20:07:50 UTC 2018: 10
Mon Dec  3 20:07:51 UTC 2018: 0
Mon Dec  3 20:07:52 UTC 2018: 0
Mon Dec  3 20:07:53 UTC 2018: 0
Mon Dec  3 20:07:54 UTC 2018: 0
Mon Dec  3 20:07:55 UTC 2018: 0
Mon Dec  3 20:07:56 UTC 2018: 0
Mon Dec  3 20:07:57 UTC 2018: 0
Mon Dec  3 20:07:58 UTC 2018: 0
Mon Dec  3 20:07:59 UTC 2018: 0

wdqs1006 at 20:07:06:

Mon Dec  3 20:06:46 UTC 2018: 0
Mon Dec  3 20:06:47 UTC 2018: 0
Mon Dec  3 20:06:48 UTC 2018: 9
Mon Dec  3 20:06:49 UTC 2018: 9
Mon Dec  3 20:06:50 UTC 2018: 9
Mon Dec  3 20:06:51 UTC 2018: 9
Mon Dec  3 20:06:52 UTC 2018: 9
Mon Dec  3 20:06:53 UTC 2018: 9
Mon Dec  3 20:06:54 UTC 2018: 9
Mon Dec  3 20:06:55 UTC 2018: 9
Mon Dec  3 20:06:56 UTC 2018: 9
Mon Dec  3 20:06:57 UTC 2018: 9
Mon Dec  3 20:06:58 UTC 2018: 9
Mon Dec  3 20:07:00 UTC 2018: 9
Mon Dec  3 20:07:01 UTC 2018: 9
Mon Dec  3 20:07:02 UTC 2018: 9
Mon Dec  3 20:07:03 UTC 2018: 9
Mon Dec  3 20:07:04 UTC 2018: 9
Mon Dec  3 20:07:05 UTC 2018: 9
Mon Dec  3 20:07:06 UTC 2018: 0
Mon Dec  3 20:07:07 UTC 2018: 0
Mon Dec  3 20:07:08 UTC 2018: 0
Mon Dec  3 20:07:09 UTC 2018: 0
Mon Dec  3 20:07:10 UTC 2018: 0
Mon Dec  3 20:07:11 UTC 2018: 0

The pattern is consistent enough that I feel pretty comfortable concluding that this is the cause.

https://jira.apache.org/jira/browse/HTTPCLIENT-1918 has a bunch of suggestions around this. It also notes that HttpClient has connection management logging that can be used to detect cases where connections aren't being released back to the pool correctly.

@Smalyshev Guessing this should go back to you for followup?

Thanks, I'll try to play with the connection pooling and see what happens and report here.

  • If it possible for nginx to be restarted (interrupting existing persistent connections) due to config updates or the like,

Yes, we do reload nginx to pick up updated OCSP responses. That happens twice a day on any given cache host, 12 hours apart.

is there a record of times when that has happened?

Kind of. See the dips in established connections here: https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=25&fullscreen&orgId=1&from=now-24h&to=now&var-server=cp3030&var-datasource=esams%20prometheus%2Fops

The reloads are likely to be causing the issue described here.

I've tried opening a persistent connection to nginx on my laptop with printf "GET / HTTP/1.1\nConnection: keep-alive\nHost: 127.0.0.1\n\n" | nc localhost 80 and then reloading nginx. The persistent connection was closed immediately. Setting worker_shutdown_timeout to 60s did not make any difference, the connection was closed immediately upon reload.

Then I've tried downloading a large (128M) file with rate limiting to make sure the thing would take a while: curl -v --limit-rate 1k http://localhost/test.img

While the download was in progress I've reloaded nginx and confirmed that the old worker process was still up waiting for the transfer to finish:

www-data 23030 0.0 0.0 11396 3624 ? S 10:47 0:00 \_ nginx: worker process is shutting down

So, idle connections seem to get closed immediately on reload, while transfers are given the time to finish.

Related: T164579

I've reduced the pool lifetime to 1s (which should be essentially as if there was no pooling if I get it correctly), let's see what happens. I've also looked through the code, and I don't see any possibility we're leaking connections (could be wrong of course) so whatever happens seems to be happening outside Updater code.

Looks like with the change above errors are no longer happening. As I see no noticeable change on the dashboard tracking entity fetch times, I think the problem is solved.

Thanks for the help, @Imarlier and @ema!