Page MenuHomePhabricator

Create a dockerized Proton instance in the Beta Cluster
Closed, ResolvedPublic

Description

Proton is currently running in the Beta Cluster on the deployment-chromium01 instance. We should create a new instance, using the role::beta::docker_services Puppet role, to host the Dockerized service using images from the deployment pipeline. The configuration will be similar to that used on deployment-push-notifications01.

I would suggest updating the naming convention to use the common name 'proton' to match the service name in k8s, making this new instance deployment-proton01 (or, for additional clarity, deployment-docker-proton01).

AC

  • Create new instance deployment-proton01 (or deployment-docker-proton01) from the latest Debian Buster image
  • Update the Puppet SSL cert to get Puppet running successfully with the Beta Cluster puppetmaster (see P7162 for an example of the procedure)
  • Add required hiera config, including the service configuration
  • Apply the role::beta::docker_services and ensure Puppet still runs successfully
  • Verify that the service is correctly serving internal requests
  • Create a security group (if needed) to expose the service port (3030) to incoming traffic, and apply it to deployment-[docker-]proton01
  • Migrate any existing references to deployment-chromium01 to deployment-[docker-]proton01
  • Delete the existing proton-beta.wmflabs.org web proxy and create new a web proxy from proton-beta.wmflabs.org to port 3030 on deployment-[docker-]proton01
  • Ensure that proton-beta.wmflabs.org correctly serves external requests
  • Destroy deployment-chromium01

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 30 2020, 6:52 PM
LGoto triaged this task as Medium priority.Jul 1 2020, 3:39 PM
LGoto moved this task from Needs triage to Backlog on the Product-Infrastructure-Team-Backlog board.
MSantos claimed this task.Jul 10 2020, 2:17 PM
MSantos moved this task from To Do to Doing on the Product-Infrastructure-Team-Backlog (Kanban) board.
Mholloway updated the task description. (Show Details)Jul 10 2020, 2:23 PM
MSantos updated the task description. (Show Details)Jul 10 2020, 2:26 PM
MSantos updated the task description. (Show Details)Jul 10 2020, 2:39 PM
MSantos updated the task description. (Show Details)Jul 10 2020, 2:46 PM

@Mholloway after changing the puppet configuration for the service, does the service boot automatically? Where can you find the logs?

Mholloway added a comment.EditedJul 10 2020, 4:48 PM

I logged into the instance and ran puppet, and saw that it was failing trying to find the provided Docker engine version:

mholloway-shell@deployment-docker-proton01:~$ sudo puppet agent -tv
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for deployment-docker-proton01.deployment-prep.eqiad.wmflabs
Info: Applying configuration version '(542c1a289b) root - [WIP] webperf: Enable prometheus-apache-exporter'
Notice: The LDAP client stack for this host is: sssd/sudo
Notice: /Stage[main]/Profile::Ldap::Client::Labs/Notify[LDAP client stack]/message: defined 'message' as 'The LDAP client stack for this host is: sssd/sudo'
Error: Could not update: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold --force-yes install docker-engine=1.12.6-0~debian-jessie' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
W: --force-yes is deprecated, use one of the options starting with --allow instead.
E: Version '1.12.6-0~debian-jessie' for 'docker-engine' was not found
Error: /Stage[main]/Docker/Package[docker-engine]/ensure: change from 'purged' to '1.12.6-0~debian-jessie' failed: Could not update: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold --force-yes install docker-engine=1.12.6-0~debian-jessie' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
W: --force-yes is deprecated, use one of the options starting with --allow instead.
E: Version '1.12.6-0~debian-jessie' for 'docker-engine' was not found
Error: Systemd start for docker failed!
journalctl log for docker:
-- Logs begin at Fri 2020-07-10 14:24:41 UTC, end at Fri 2020-07-10 16:40:10 UTC. --
-- No entries --

Error: /Stage[main]/Profile::Docker::Engine/Service[docker]/ensure: change from 'stopped' to 'running' failed: Systemd start for docker failed!
journalctl log for docker:
-- Logs begin at Fri 2020-07-10 14:24:41 UTC, end at Fri 2020-07-10 16:40:10 UTC. --
-- No entries --

Notice: /Stage[main]/Profile::Docker::Runner/Service::Docker[mediawiki-services-chromium-render]/File[/etc/mediawiki-services-chromium-render]: Dependency Service[docker] has failures: true
Warning: /Stage[main]/Profile::Docker::Runner/Service::Docker[mediawiki-services-chromium-render]/File[/etc/mediawiki-services-chromium-render]: Skipping because of failed dependencies
Warning: /Stage[main]/Profile::Docker::Runner/Service::Docker[mediawiki-services-chromium-render]/File[/etc/mediawiki-services-chromium-render/config.yaml]: Skipping because of failed dependencies
Warning: /Stage[main]/Profile::Docker::Runner/Service::Docker[mediawiki-services-chromium-render]/Exec[docker pull of mediawiki-services-chromium-render:2020-04-09-191920-production for mediawiki-services-chromium-render]: Skipping because of failed dependencies
Warning: /Stage[main]/Profile::Docker::Runner/Service::Docker[mediawiki-services-chromium-render]/Systemd::Service[mediawiki-services-chromium-render]/Systemd::Unit[mediawiki-services-chromium-render]/File[/lib/systemd/system/mediawiki-services-chromium-render.service]: Skipping because of failed dependencies
Warning: /Stage[main]/Profile::Docker::Runner/Service::Docker[mediawiki-services-chromium-render]/Systemd::Service[mediawiki-services-chromium-render]/Systemd::Unit[mediawiki-services-chromium-render]/Exec[systemd daemon-reload for mediawiki-services-chromium-render.service]: Skipping because of failed dependencies
Warning: /Stage[main]/Profile::Docker::Runner/Service::Docker[mediawiki-services-chromium-render]/Systemd::Service[mediawiki-services-chromium-render]/Service[mediawiki-services-chromium-render]: Skipping because of failed dependencies
Info: Stage[main]: Unscheduling all events on Stage[main]
Notice: Applied catalog in 4.28 seconds

I got Puppet running successfully by updating profile::docker::engine::version to 18.09.1+dfsg1-7.1+deb10u1 and adding the line profile::docker::engine::packagename: docker.io.

Now the service has been created, but it does not yet appear to be up and running successfully:

mholloway-shell@deployment-docker-proton01:~$ sudo service mediawiki-services-chromium-render status
● mediawiki-services-chromium-render.service - Systemd runner for mediawiki-services-chromium-render
   Loaded: loaded (/lib/systemd/system/mediawiki-services-chromium-render.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Fri 2020-07-10 16:47:55 UTC; 6s ago
  Process: 7826 ExecStartPre=/usr/bin/docker stop mediawiki-services-chromium-render.service (code=exited, status=1/FAILURE)
  Process: 7832 ExecStartPre=/usr/bin/docker rm mediawiki-services-chromium-render.service (code=exited, status=1/FAILURE)
  Process: 7838 ExecStart=/usr/bin/docker run --rm=true -p 3030:3030 -v /etc/mediawiki-services-chromium-render/:/etc/mediawiki-services-chromium-render --name mediawiki-services-chromium-render.service 
 Main PID: 7838 (code=exited, status=1/FAILURE)

Jul 10 16:47:55 deployment-docker-proton01 systemd[1]: mediawiki-services-chromium-render.service: Main process exited, code=exited, status=1/FAILURE
Jul 10 16:47:55 deployment-docker-proton01 systemd[1]: mediawiki-services-chromium-render.service: Failed with result 'exit-code'.
MSantos updated the task description. (Show Details)Mon, Jul 13, 7:35 PM

Change 612406 had a related patch set uploaded (by MSantos; owner: MSantos):
[operations/puppet@production] update proton beta instance for restbase

https://gerrit.wikimedia.org/r/612406

@Mholloway I'm almost finished with the setup, but after changing the web-proxy for proton, I'm getting 504 errors even though the services works perfectly with internal requests, did you have the same experience?

Try https://proton-beta.wmflabs.org/en.wikipedia.org/v1/pdf/Cat/letter

The pdfrender security group that was applied to the instance was not allowing incoming requests on port 3030. I added a rule to allow this, and now the instance appears to be working well.

MSantos updated the task description. (Show Details)Tue, Jul 14, 2:38 PM

Change 612406 merged by Dzahn:
[operations/puppet@production] update proton beta instance for restbase

https://gerrit.wikimedia.org/r/612406

Mholloway closed this task as Resolved.Tue, Aug 4, 10:01 PM
Mholloway updated the task description. (Show Details)