docker
Open, MediumPublic
Actions

Assigned To

None

Authored By

	hashar
	Apr 15 2019, 8:15 PM

Description

Some integration-slave-docker instances have been provisioned with a /var/lib/docker partition but others do not. It is a bit confusing.

$ sudo cumin --trace --force 'name:docker' 'mount -t ext4|sort'
13 hosts will be targeted:
integration-slave-docker-[1021,1034,1037,1040-1041,1043,1048-1054].integration.eqiad.wmflabs

===== NODE GROUP =====                                                                                                                                                                                    
(3) integration-slave-docker-[1021,1049,1053].integration.eqiad.wmflabs                                                                                                                                   
----- OUTPUT of 'mount -t ext4|sort' -----                                                                                                                                                                
/dev/mapper/vd-second--local--disk on /srv type ext4 (rw,relatime,data=ordered)                                                                                                                           
/dev/vda3 on / type ext4 (rw,relatime,data=ordered)

===== NODE GROUP =====                                                                                                                                                                                    
(10) integration-slave-docker-[1034,1037,1040-1041,1043,1048,1050-1052,1054].integration.eqiad.wmflabs                                                                                                    
----- OUTPUT of 'mount -t ext4|sort' -----                                                                                                                                                                
/dev/mapper/vd-docker on /var/lib/docker type ext4 (rw,relatime,data=ordered)                                                                                                                             
/dev/mapper/vd-second--local--disk on /srv type ext4 (rw,relatime,data=ordered)                                                                                                                           
/dev/vda3 on / type ext4 (rw,relatime,data=ordered)
================

Or in short, the following instances lack a /var/lib/docker dedicated partition:

integration-slave-docker-1021.integration.eqiad.wmflabs

integration-slave-docker-1049.integration.eqiad.wmflabs

integration-slave-docker-1053.integration.eqiad.wmflabs

Maybe because they predate the introduction of /var/lib/docker and puppet is unable to magically shuffle the partitions for us. In which case we would have to provision new instances and delete those.

Related Objects

Mentioned Here: rOPUPdf1dc5957edc: ci: Allow Docker nodes to use a dedicated /var/lib/docker volume

Event Timeline

hashar created this task.Apr 15 2019, 8:15 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 15 2019, 8:15 PM

Paladox added a project: Continuous-Integration-Infrastructure.Apr 15 2019, 8:17 PM

Paladox subscribed.

That is set by profile::ci::dockervolume which comes from df1dc5957edce36b12909b3e8bd8cc1b8935fb44 by @dduvall

I have added two Jenkins slaves with role::ci::slave::labs::docker with $docker_lvm_volume = true. One had a sub partition /var/lib/docker the other did not :]

greg triaged this task as Medium priority.May 30 2019, 11:16 PM

greg edited projects, added Release-Engineering-Team (Backlog); removed Release-Engineering-Team (Kanban).

• Phabricator_maintenance edited projects, added Release-Engineering-Team-TODO; removed Release-Engineering-Team (Backlog).Jun 12 2019, 11:51 PM

• Phabricator_maintenance moved this task from Should be empty (use Release-Engineering-Team) to Later / Need volunteer on the Release-Engineering-Team-TODO board.Jun 12 2019, 11:55 PM

greg added a project: Release-Engineering-Team.Jun 21 2019, 10:35 PM

greg edited projects, added Release-Engineering-Team (CI & Testing services); removed Release-Engineering-Team.Jun 24 2019, 7:51 PM

This got fixed in the re-build for stretch, it appears:

jforrester@integration-cumin-01:~$ sudo cumin --trace --force 'name:docker' 'mount -t ext4|sort'
16 hosts will be targeted:
integration-agent-docker-[1001-1014,1016].integration.eqiad.wmflabs,integration-agent-puppet-docker-1002.integration.eqiad.wmflabs
FORCE mode enabled, continuing without confirmation
===== NODE GROUP =====
(5) integration-agent-docker-[1001-1005].integration.eqiad.wmflabs
----- OUTPUT of 'mount -t ext4|sort' -----
/dev/mapper/vd-docker on /var/lib/docker type ext4 (rw,relatime,data=ordered)
/dev/mapper/vd-second--local--disk on /srv type ext4 (rw,relatime,data=ordered)
/dev/vda3 on / type ext4 (rw,relatime,data=ordered)
===== NODE GROUP =====
(11) integration-agent-docker-[1006-1014,1016].integration.eqiad.wmflabs,integration-agent-puppet-docker-1002.integration.eqiad.wmflabs
----- OUTPUT of 'mount -t ext4|sort' -----
/dev/mapper/vd-docker on /var/lib/docker type ext4 (rw,relatime,data=ordered)
/dev/mapper/vd-second--local--disk on /srv type ext4 (rw,relatime,data=ordered)
/dev/vda2 on / type ext4 (rw,relatime,data=ordered)
================
PASS:  |████████████████████████████████████████████████████████████████████████| 100% (16/16) [00:00<00:00, 19.79hosts/s]
FAIL:  |                                                                                 |   0% (0/16) [00:00<?, ?hosts/s]
100.0% (16/16) success ratio (>= 100.0% threshold) for command: 'mount -t ext4|sort'.
100.0% (16/16) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.

It is a race condition / improper order somewhere in our Puppet manifests. The reason our instances are all fine now is that I have manually fixed them upon creation :]

hashar removed hashar as the assignee of this task.May 25 2020, 6:33 PM

thcipriani removed a project: Release-Engineering-Team (CI & Testing services).Apr 20 2021, 1:09 AM

thcipriani edited projects, added Release-Engineering-Team (thcipriani-workboard-fiddling); removed Release-Engineering-Team-TODO.Apr 20 2021, 3:41 AM

thcipriani moved this task from thcipriani-workboard-fiddling to Seen (ARCHIVE) on the Release-Engineering-Team board.Apr 20 2021, 3:47 AM

thcipriani edited projects, added Release-Engineering-Team; removed Release-Engineering-Team (thcipriani-workboard-fiddling).

thcipriani edited projects, added Release-Engineering-Team (Seen); removed Release-Engineering-Team.Apr 20 2021, 3:23 PM

Fix partitions on CI slaves , some are missing /var/lib/dockerOpen, MediumPublicActions

Description

Related Objects

Event Timeline

Fix partitions on CI slaves , some are missing /var/lib/docker
Open, MediumPublic
Actions