Toolforge's static webserver broken by Puppet changes and stale nginx packages
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	bd808
	Sep 14 2017, 5:04 AM

Description

The server at https://tools-static.wmflabs.org/ broke around 2017-09-13T13:24. Investigation showed that nginx related Puppet configuration had changed recently (rOPUPfb85f58). @BBlack confirmed that a recent version of nginx would be needed to be compatible with the new configuration. A simple apt-get install nginx-common fixed the server by upgrading from 1.11.3-1+wmf2 to 1.11.10-1+wmf3.

There are a couple of issues this highlights that we should find better means of addressing:

Toolforge and Cloud-VPS broadly depend on unattended-upgrades to keep system packages up to date. This dependency can cause problems both by upgrading things that should not be upgraded (T159254) and, as was seen here, not upgrading things that should be upgraded. It would be useful to have some standard practices and/or monitoring systems to make it easier for any Cloud VPS tenant to know when there are packages that are in need of upgrade due to security or required functionality changes.
Puppet changes to shared components (apache, nginx, Puppet, HHVM, Kubernetes, etc) which are used in WMF's main server clusters, Cloud VPS / Toolforge infrastructure, and other Cloud VPS projects could be announced better. Its unreasonable to expect all such changes to be reviewed by everyone who might be impacted, but it would be nice to find a lightweight communication method of making others aware of changes which could cause a loss of service if packages are not up to date or when other manual interventions are needed.

These issues are separate, but at least in my mind, related. Loss of communication was a risk identified in the process of separating the cloud-services-team from the main SRE team. I don't think that this separation is the root cause of either of these issues, but it does exacerbate any existing gaps in signaling that existed prior to the split. There have long been communications lag and tooling differences which cause problems for some Cloud VPS tenants who are tracking upstream changes from production closely. If we can find communication methods that scale, we will be better able to serve both inter-team needs and the needs of our users.

In the case of the nginx upgrade issue, there was even intra-team confusion and/or tooling failure. A prior merge of the same patch (rOPUP1811def) on 2017-07-13 caused https://tools.wmflabs.org/ to fail. At that time nginx was manually upgraded on the tools-proxy-* and project-proxy nodes, but the tools-static-* hosts were not upgraded. This shows that there was a lack of knowledge (by me) of the locations of nginx deploys that are critical to Cloud Services operation, and that we do not have any functional signaling mechanism for out of sync package versions within Toolforge and likely other Cloud VPS projects.

As written this ticket is not easily directly actionable, but it can serve as a point to gather relevant discussion of the broad issues. We should fork off sub tasks as more actionable issues are uncovered by the discussion.

Related Objects

Mentioned In: T180811: tools cluster: packages have conffile prompts and needs to be upgraded manually
Mentioned Here: T177920: unattended-upgrades not upgrading "-wikimedia" packages automatically in wmcs
T169247: Document recommended process for installing vendor provided package upgrades in Wikimedia VPS
rOPUP1811def52602: ssl_ciphersuite: limit ECDH curves where possible
rOPUPfb85f58ffc69: Revert "Revert "ssl_ciphersuite: limit ECDH curves where possible""
T159254: Blacklist apache from unattended-upgrades on tools puppetmaster

Event Timeline

bd808 created this task.Sep 14 2017, 5:04 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 14 2017, 5:04 AM

bd808 triaged this task as Medium priority.Sep 14 2017, 5:04 AM

bd808 added a project: cloud-services-team (Kanban).

bd808 moved this task from Inbox to Needs discussion on the cloud-services-team (Kanban) board.

Quiddity renamed this task from Toolforge's static websever broken by Puppet changes and stale nginx packages to Toolforge's static webserver broken by Puppet changes and stale nginx packages.Sep 14 2017, 5:47 AM

A side note.

When deploying T177920 into the tools cluster, we found some issue with unattended-upgrades and the nginx packages which required manual intervention.
The issue only showed up in tools-elastic-XX nodes, and @bd808 fixed all of them by hand. I think a manual install was enough.

The unattended-upgrades log was something like this:

[...]
Packages that will be upgraded: arcconf debdeploy-client diamond librsvg2-2 librsvg2-common libssl1.0.0 linux-base nginx-common nginx-light openjdk-8-jdk openjdk-8-jdk-headless openjdk-8-jre openjdk-8-jre-headless openssl python-diamond python-requests python-urllib3
Writing dpkg log to '/var/log/unattended-upgrades/unattended-upgrades-dpkg.log'
Preconfiguring packages ...
(Reading database ... 54158 files and directories currently installed.)
Preparing to unpack .../libssl1.0.0_1.0.2m-1~wmf1_amd64.deb ...
Unpacking libssl1.0.0:amd64 (1.0.2m-1~wmf1) over (1.0.2d-1~wmf1) ...
Selecting previously unselected package libssl1.1:amd64.
Preparing to unpack .../libssl1.1_1.1.0g-1+wmf1_amd64.deb ...
Unpacking libssl1.1:amd64 (1.1.0g-1+wmf1) ...
Preparing to unpack .../librsvg2-common_2.40.18-1~wmf1_amd64.deb ...
Unpacking librsvg2-common:amd64 (2.40.18-1~wmf1) over (2.40.5-1+deb8u2) ...
Preparing to unpack .../librsvg2-2_2.40.18-1~wmf1_amd64.deb ...
Unpacking librsvg2-2:amd64 (2.40.18-1~wmf1) over (2.40.5-1+deb8u2) ...
Preparing to unpack .../arcconf_1%3a2.02.22404-1_amd64.deb ...
Unpacking arcconf (1:2.02.22404-1) over (7.31.18856-1) ...
Preparing to unpack .../debdeploy-client_0.0.99.1-1+deb8u1_all.deb ...
Unpacking debdeploy-client (0.0.99.1-1+deb8u1) over (0.0.99-2+deb8u1) ...
Preparing to unpack .../python-diamond_4.0.515-4~bpo8+2_all.deb ...
Unpacking python-diamond (4.0.515-4~bpo8+2) over (3.5-5) ...
Replacing files in old package diamond (3.5-5) ...
Preparing to unpack .../diamond_4.0.515-4~bpo8+2_all.deb ...
Unpacking diamond (4.0.515-4~bpo8+2) over (3.5-5) ...
Preparing to unpack .../nginx-common_1.13.6-2+wmf1~jessie1_all.deb ...
Unpacking nginx-common (1.13.6-2+wmf1~jessie1) over (1.9.4-1+wmf2) ...
Preparing to unpack .../nginx-light_1.13.6-2+wmf1~jessie1_amd64.deb ...
Unpacking nginx-light (1.13.6-2+wmf1~jessie1) over (1.9.4-1+wmf2) ...
Selecting previously unselected package libnginx-mod-http-echo.
Preparing to unpack .../libnginx-mod-http-echo_1.13.6-2+wmf1~jessie1_amd64.deb ...
Unpacking libnginx-mod-http-echo (1.13.6-2+wmf1~jessie1) ...
Preparing to unpack .../linux-base_4.3~bpo8+1_all.deb ...
Unpacking linux-base (4.3~bpo8+1) over (3.5) ...
Preparing to unpack .../openjdk-8-jdk_8u151-b12-1~bpo8+1_amd64.deb ...
Unpacking openjdk-8-jdk:amd64 (8u151-b12-1~bpo8+1) over (8u131-b11-1~bpo8+1) ...
Preparing to unpack .../openjdk-8-jdk-headless_8u151-b12-1~bpo8+1_amd64.deb ...
Unpacking openjdk-8-jdk-headless:amd64 (8u151-b12-1~bpo8+1) over (8u131-b11-1~bpo8+1) ...
Preparing to unpack .../openjdk-8-jre_8u151-b12-1~bpo8+1_amd64.deb ...
Unpacking openjdk-8-jre:amd64 (8u151-b12-1~bpo8+1) over (8u131-b11-1~bpo8+1) ...
Preparing to unpack .../openjdk-8-jre-headless_8u151-b12-1~bpo8+1_amd64.deb ...
Unpacking openjdk-8-jre-headless:amd64 (8u151-b12-1~bpo8+1) over (8u131-b11-1~bpo8+1) ...
Preparing to unpack .../openssl_1.0.2m-1~wmf1_amd64.deb ...
Unpacking openssl (1.0.2m-1~wmf1) over (1.0.1t-1+deb8u7) ...
Preparing to unpack .../python-urllib3_1.19.1-1_all.deb ...
Unpacking python-urllib3 (1.19.1-1) over (1.9.1-3) ...
Preparing to unpack .../python-requests_2.12.3-1_all.deb ...
Unpacking python-requests (2.12.3-1) over (2.4.3-6) ...
Processing triggers for libgdk-pixbuf2.0-0:amd64 (2.31.1-2+deb8u6) ...
Processing triggers for systemd (215-17+deb8u1) ...
Processing triggers for man-db (2.7.0.2-5) ...
Processing triggers for mime-support (3.58) ...
Processing triggers for hicolor-icon-theme (0.13-1) ...
Setting up libssl1.0.0:amd64 (1.0.2m-1~wmf1) ...
Setting up libssl1.1:amd64 (1.1.0g-1+wmf1) ...
Setting up librsvg2-2:amd64 (2.40.18-1~wmf1) ...
Setting up librsvg2-common:amd64 (2.40.18-1~wmf1) ...
Setting up arcconf (1:2.02.22404-1) ...
Setting up debdeploy-client (0.0.99.1-1+deb8u1) ...
Setting up python-diamond (4.0.515-4~bpo8+2) ...
Setting up diamond (4.0.515-4~bpo8+2) ...
Installing new version of config file /etc/diamond/diamond.conf.example ...
Installing new version of config file /etc/init.d/diamond ...
Installing new version of config file /etc/init/diamond.conf ...
Setting up nginx-common (1.13.6-2+wmf1~jessie1) ...
Installing new version of config file /etc/logrotate.d/nginx ...
Installing new version of config file /etc/nginx/nginx.conf ...

Configuration file '/etc/nginx/sites-available/default'
 ==> Deleted (by you or by a script) since installation.
 ==> Package distributor has shipped an updated version.
   What would you like to do about it ?  Your options are:
    Y or I  : install the package maintainer's version
    N or O  : keep your currently-installed version
      D     : show the differences between the versions
      Z     : start a shell to examine the situation
 The default action is to keep your current version.
*** default (Y/I/N/O/D/Z) [default=N] ? dpkg: error processing package nginx-common (--configure):
 EOF on stdin at conffile prompt
dpkg: dependency problems prevent configuration of libnginx-mod-http-echo:
 libnginx-mod-http-echo depends on nginx-common (= 1.13.6-2+wmf1~jessie1); however:
  Package nginx-common is not configured yet.

dpkg: error processing package libnginx-mod-http-echo (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of nginx-light:
 nginx-light depends on libnginx-mod-http-echo (= 1.13.6-2+wmf1~jessie1); however:
  Package libnginx-mod-http-echo is not configured yet.
 nginx-light depends on nginx-common (= 1.13.6-2+wmf1~jessie1); however:
  Package nginx-common is not configured yet.

dpkg: error processing package nginx-light (--configure):
 dependency problems - leaving unconfigured
Setting up linux-base (4.3~bpo8+1) ...
Setting up openjdk-8-jre-headless:amd64 (8u151-b12-1~bpo8+1) ...
Installing new version of config file /etc/java-8-openjdk/security/java.security ...
Setting up openjdk-8-jre:amd64 (8u151-b12-1~bpo8+1) ...
Setting up openjdk-8-jdk-headless:amd64 (8u151-b12-1~bpo8+1) ...
Setting up openjdk-8-jdk:amd64 (8u151-b12-1~bpo8+1) ...
Setting up openssl (1.0.2m-1~wmf1) ...
Setting up python-urllib3 (1.19.1-1) ...
Setting up python-requests (2.12.3-1) ...
Processing triggers for libc-bin (2.19-18+deb8u10) ...
Processing triggers for libgdk-pixbuf2.0-0:amd64 (2.31.1-2+deb8u6) ...
Processing triggers for systemd (215-17+deb8u1) ...
Errors were encountered while processing:
 nginx-common
 libnginx-mod-http-echo
 nginx-light
Error in function: 
SystemError: E:Sub-process /usr/bin/dpkg returned an error code (1)
Exception happened during upgrade.
Traceback (most recent call last):
  File "/usr/bin/unattended-upgrades", line 392, in upgrade_normal
    res = cache.commit(install_progress=iprogress)
  File "/usr/lib/python3/dist-packages/apt/cache.py", line 505, in commit
    raise SystemError("installArchives() failed")
SystemError: installArchives() failed
Installing the upgrades failed!
error message: 'installArchives() failed'
dpkg returned a error! See '/var/log/unattended-upgrades/unattended-upgrades-dpkg.log' for details
Extracting content from '/var/log/unattended-upgrades/unattended-upgrades-dpkg.log' since '2017-11-16 16:29:41.996752'

The unattended-upgrades failure was caused by a conffile prompt for the /etc/nginx/sites-enabled/default file which is provided by the package but removed by our Puppet configuration via the File['/etc/nginx/sites-enabled`] in ::nginx.

aborrero mentioned this in T180811: tools cluster: packages have conffile prompts and needs to be upgraded manually.Nov 20 2017, 11:55 AM

Krinkle updated the task description. (Show Details)Dec 22 2017, 6:12 PM

• Bstorm moved this task from Needs discussion to Inbox on the cloud-services-team (Kanban) board.Oct 29 2018, 1:13 PM

• Phabricator_maintenance moved this task from Backlog to Acknowledged on the SRE board.Jan 26 2019, 9:28 PM

• Bstorm closed this task as Declined.Dec 18 2019, 4:44 PM

Toolforge's static webserver broken by Puppet changes and stale nginx packagesClosed, DeclinedPublicActions

Description

Related Objects

Event Timeline

Toolforge's static webserver broken by Puppet changes and stale nginx packages
Closed, DeclinedPublic
Actions