Page MenuHomePhabricator

Instance metadata service returning HTTP 500 for majority of instances checked
Closed, ResolvedPublic

Description

1krenair@deployment-cumin:~$ sudo cumin '*' 'curl "http://169.254.169.254/latest/meta-data/instance-type" -s'
269 hosts will be targeted:
3deployment-acme-chief[03-04].deployment-prep.eqiad.wmflabs,deployment-aqs[01-03].deployment-prep.eqiad.wmflabs,deployment-cache-text05.deployment-prep.eqiad.wmflabs,deployment-cache-upload04.deployment-prep.eqiad.wmflabs,deployment-changeprop.deployment-prep.eqiad.wmflabs,deployment-chromium[01-02].deployment-prep.eqiad.wmflabs,deployment-conf03.deployment-prep.eqiad.wmflabs,deployment-cpjobqueue.deployment-prep.eqiad.wmflabs,deployment-cumin.deployment-prep.eqiad.wmflabs,deployment-db[05-06].deployment-prep.eqiad.wmflabs,deployment-deploy[01-02].deployment-prep.eqiad.wmflabs,deployment-dumps-puppetmaster02.deployment-prep.eqiad.wmflabs,deployment-elastic[05-07].deployment-prep.eqiad.wmflabs,deployment-etcd-01.deployment-prep.eqiad.wmflabs,deployment-eventgate-analytics-1.deployment-prep.eqiad.wmflabs,deployment-eventlog05.deployment-prep.eqiad.wmflabs,deployment-fluorine02.deployment-prep.eqiad.wmflabs,deployment-imagescaler[01-03].deployment-prep.eqiad.wmflabs,deployment-ircd.deployment-prep.eqiad.wmflabs,deployment-jobrunner03.deployment-prep.eqiad.wmflabs,deployment-kafka-jumbo-[1-2].deployment-prep.eqiad.wmflabs,deployment-kafka-main-[1-2].deployment-prep.eqiad.wmflabs,deployment-logstash2.deployment-prep.eqiad.wmflabs,deployment-maps[04-05].deployment-prep.eqiad.wmflabs,deployment-mathoid.deployment-prep.eqiad.wmflabs,deployment-mcs01.deployment-prep.eqiad.wmflabs,deployment-mediawiki-[07,09].deployment-prep.eqiad.wmflabs,deployment-memc[04-07].deployment-prep.eqiad.wmflabs,deployment-ms-be[03-04].deployment-prep.eqiad.wmflabs,deployment-ms-fe02.deployment-prep.eqiad.wmflabs,deployment-mwmaint01.deployment-prep.eqiad.wmflabs,deployment-mx02.deployment-prep.eqiad.wmflabs,deployment-ores01.deployment-prep.eqiad.wmflabs,deployment-parsoid09.deployment-prep.eqiad.wmflabs,deployment-pdfrender02.deployment-prep.eqiad.wmflabs,deployment-poolcounter04.deployment-prep.eqiad.wmflabs,deployment-prometheus02.deployment-prep.eqiad.wmflabs,deployment-puppetdb02.deployment-prep.eqiad.wmflabs,deployment-puppetmaster03.deployment-prep.eqiad.wmflabs,deployment-restbase[01-02].deployment-prep.eqiad.wmflabs,deployment-sca[01-02,04].deployment-prep.eqiad.wmflabs,deployment-sentry01.deployment-prep.eqiad.wmflabs,deployment-sessionstore01.deployment-prep.eqiad.wmflabs,deployment-snapshot01.deployment-prep.eqiad.wmflabs,deployment-urldownloader02.deployment-prep.eqiad.wmflabs,deployment-webperf[11-12].deployment-prep.eqiad.wmflabs,deployment-zookeeper02.deployment-prep.eqiad.wmflabs
4Confirm to continue [y/n]? y
5===== NODE GROUP =====
6(47) deployment-acme-chief[03-04].deployment-prep.eqiad.wmflabs,deployment-aqs03.deployment-prep.eqiad.wmflabs,deployment-chromium02.deployment-prep.eqiad.wmflabs,deployment-conf03.deployment-prep.eqiad.wmflabs,deployment-cpjobqueue.deployment-prep.eqiad.wmflabs,deployment-cumin.deployment-prep.eqiad.wmflabs,deployment-db[05-06].deployment-prep.eqiad.wmflabs,deployment-dumps-puppetmaster02.deployment-prep.eqiad.wmflabs,deployment-elastic[05-07].deployment-prep.eqiad.wmflabs,deployment-etcd-01.deployment-prep.eqiad.wmflabs,deployment-eventgate-analytics-1.deployment-prep.eqiad.wmflabs,deployment-eventlog05.deployment-prep.eqiad.wmflabs,deployment-fluorine02.deployment-prep.eqiad.wmflabs,deployment-imagescaler02.deployment-prep.eqiad.wmflabs,deployment-ircd.deployment-prep.eqiad.wmflabs,deployment-jobrunner03.deployment-prep.eqiad.wmflabs,deployment-kafka-jumbo-[1-2].deployment-prep.eqiad.wmflabs,deployment-kafka-main-[1-2].deployment-prep.eqiad.wmflabs,deployment-mcs01.deployment-prep.eqiad.wmflabs,deployment-mediawiki-07.deployment-prep.eqiad.wmflabs,deployment-memc[04-06].deployment-prep.eqiad.wmflabs,deployment-ms-be[03-04].deployment-prep.eqiad.wmflabs,deployment-mwmaint01.deployment-prep.eqiad.wmflabs,deployment-mx02.deployment-prep.eqiad.wmflabs,deployment-ores01.deployment-prep.eqiad.wmflabs,deployment-parsoid09.deployment-prep.eqiad.wmflabs,deployment-pdfrender02.deployment-prep.eqiad.wmflabs,deployment-prometheus02.deployment-prep.eqiad.wmflabs,deployment-puppetmaster03.deployment-prep.eqiad.wmflabs,deployment-restbase02.deployment-prep.eqiad.wmflabs,deployment-sca[01-02,04].deployment-prep.eqiad.wmflabs,deployment-sentry01.deployment-prep.eqiad.wmflabs,deployment-sessionstore01.deployment-prep.eqiad.wmflabs,deployment-snapshot01.deployment-prep.eqiad.wmflabs,deployment-urldownloader02.deployment-prep.eqiad.wmflabs,deployment-webperf11.deployment-prep.eqiad.wmflabs
7----- OUTPUT of 'curl "http://169...nstance-type" -s' -----
8500 Internal Server Error
9
10Remote metadata server experienced an internal server error.
11
12
13===== NODE GROUP =====
14(8) deployment-changeprop.deployment-prep.eqiad.wmflabs,deployment-chromium01.deployment-prep.eqiad.wmflabs,deployment-mathoid.deployment-prep.eqiad.wmflabs,deployment-ms-fe02.deployment-prep.eqiad.wmflabs,deployment-poolcounter04.deployment-prep.eqiad.wmflabs,deployment-puppetdb02.deployment-prep.eqiad.wmflabs,deployment-webperf12.deployment-prep.eqiad.wmflabs,deployment-zookeeper02.deployment-prep.eqiad.wmflabs
15----- OUTPUT of 'curl "http://169...nstance-type" -s' -----
16m1.small
17===== NODE GROUP =====
18(2) deployment-deploy[01-02].deployment-prep.eqiad.wmflabs
19----- OUTPUT of 'curl "http://169...nstance-type" -s' -----
20c8.m8.s60
21===== NODE GROUP =====
22(1) deployment-logstash2.deployment-prep.eqiad.wmflabs
23----- OUTPUT of 'curl "http://169...nstance-type" -s' -----
24m1.xlarge
25===== NODE GROUP =====
26(7) deployment-aqs[01-02].deployment-prep.eqiad.wmflabs,deployment-cache-text05.deployment-prep.eqiad.wmflabs,deployment-cache-upload04.deployment-prep.eqiad.wmflabs,deployment-imagescaler[01,03].deployment-prep.eqiad.wmflabs,deployment-memc07.deployment-prep.eqiad.wmflabs
27----- OUTPUT of 'curl "http://169...nstance-type" -s' -----
28m1.medium
29===== NODE GROUP =====
30(4) deployment-maps[04-05].deployment-prep.eqiad.wmflabs,deployment-mediawiki-09.deployment-prep.eqiad.wmflabs,deployment-restbase01.deployment-prep.eqiad.wmflabs
31----- OUTPUT of 'curl "http://169...nstance-type" -s' -----
32m1.large
33================
34PASS: |█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (69/69) [00:35<00:00, 1.94hosts/s]
35FAIL: | | 0% (0/69) [00:35<?, ?hosts/s]
36100.0% (69/69) success ratio (>= 100.0% threshold) for command: 'curl "http://169...nstance-type" -s'.
37100.0% (69/69) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.

Causes errors like these from facter:

Failed to fetch ec2 uri http://169.254.169.254/latest/meta-data/block-device-mapping/: 500 Internal Server Error
Could not retrieve fact='ec2_metadata', resolution='rest': undefined method `each' for nil:NilClass

Which shows up in puppet runs and may be responsible for strange puppet errors such as:

image.png (396×1 px, 120 KB)

Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, Reading data from Hosts/deployment-cumin failed: NoMethodError: undefined method `[]' for nil:NilClass at /etc/puppet/manifests/realm.pp:24:14 on node deployment-cumin.deployment-prep.eqiad.wmflabs

Related Objects

Event Timeline

Krenair assigned this task to Andrew.

<andrewbogott> Krenair: the api server (which actually provides metadata) isn't getting the requests.
<andrewbogott> The neutron agent that runs on the cloudvirts is complaining about lost queue messages, so I'm restarting rabbit in response to that
<andrewbogott> Krenair, try your test again?
<Krenair> andrewbogott, that did the trick