Dumps not accessible from container pods
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Benjavalero
	Mar 11 2020, 8:33 PM

Description

Folder /public/dumps/public/eswiki is not accessible from container pods jdk8 or jdk11 in ToolLabs server.

In particular, /public/dumps is accessible but not the subfolder /public/dumps/public.

However, the folder is accessible from the regular tool account, before starting the pod.

This folder was accessible from the mentioned pods one month ago.

Thanks in advance,

Details

	Subject	Repo	Branch	Lines +/-
	k8s-dumps: make the symlinks for dumps NFS work inside toolforge k8s	labs/tools/maintain-kubeusers	master	+230 -178
	tests: fix cassette generation and add some testing	labs/tools/maintain-kubeusers	master	+457 -175

Customize query in gerrit

Related Objects

Mentioned In: T127559: Soft mount option via hiera for NFS
Mentioned Here: P11086 rewrite-psp-preset.sh

Event Timeline

Benjavalero created this task.Mar 11 2020, 8:33 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 11 2020, 8:33 PM

• bd808 triaged this task as High priority.Apr 27 2020, 2:36 AM

• bd808 added projects: Data-Services, cloud-services-team (Kanban).

$ webservice jdk11 shell
$ ls -alh /public/dumps
total 16K
drwxr-xr-x 2 root root 4.0K Mar 13 20:36 .
drwxr-xr-x 3 root root 4.0K Apr 27 02:36 ..
lrwxrwxrwx 1 root root   52 Mar 13 20:36 incr -> /mnt/nfs/dumps-labstore1007.wikimedia.org/other/incr
lrwxrwxrwx 1 root root   68 Mar 13 20:36 pagecounts-all-sites -> /mnt/nfs/dumps-labstore1007.wikimedia.org/other/pagecounts-all-sites
lrwxrwxrwx 1 root root   62 Mar 13 20:36 pagecounts-raw -> /mnt/nfs/dumps-labstore1007.wikimedia.org/other/pagecounts-raw
lrwxrwxrwx 1 root root   57 Mar 13 20:36 pageviews -> /mnt/nfs/dumps-labstore1007.wikimedia.org/other/pageviews
lrwxrwxrwx 1 root root   42 Mar 13 20:36 public -> /mnt/nfs/dumps-labstore1007.wikimedia.org/
$ ls -lh /mnt
total 0

This is a bind mount problem, and almost certainly not restricted by Kubernetes container type. We are mounting the /public/dumps directory from the Kubernetes exec node into the container, but that directory is only a collection of symlinks to other mounts on the exec node. Because we are not also exposing these additional mounts to the container the symlinks go nowhere.

I'm not quite sure what the right fix is here. I can see three initial options:

Mount /mnt/nfs/dumps-labstore1007.wikimedia.org instead of /public/dumps. This would be fragile because things would break badly when labstore1007 is taken offline for maintenance and labstore1006 is expected to take over.
Mount each of the directories under /public/dumps individually into the container. This would work because resolving the symlink would be done on the exec host side of things. I'm pretty sure this would work because it is how /data/project ends up being mounted. That is also a symlink on the exec node, but the symlinked volume shows up inside the container as expected. What I don't know is if this would also have the NFS primary failover problem or not. (If not does that mean that /data/project is also resistant to NFS primary failover?)
Mount both /mnt/nfs/dumps-labstore1006.wikimedia.org and /mnt/nfs/dumps-labstore1007.wikimedia.org in those locations inside the container in addition to mounting the symlink farm in /public/dumps. This would make the containers most directly like the bastions and exec nodes. I think this would preserve operations inside a container in the event of symlink switching on the underlying exec node.

@Bstorm does any of these seem "best" to you? Or better yet do you have a more clever idea of how to fix this?

• bd808 moved this task from Backlog to Dumps on the Data-Services board.Apr 27 2020, 3:14 AM

Ahhhh, I understand what's up now. What changed is exports on the dumps servers.

The original solution was to mount the specific symlinks into the containers just like we do for the /home and /data/project dirs. With the old exports, I believe this would have just worked because /public/dumps was also a symlink so the symlinks under that were still valid.

All that said, there were problems when the symlinks changed because they were deleted and recreated. Let me test what happens on a cluster of this version when the container mounts a hostdir that is a symlink that goes away and comes back. If that works now, let's revive solutions #2

If it doesn't work, #3 could have unintended consequences as well because of the kernel difficulty in letting go of a mount. #1 would be way too fragile. On the last failovers, it worked alright, but let's test quick.

Ok, so what the container does when the symlink changes is that it retains the original definition of the symlink. That means it effectively will not failover until the pod is deleted/restarted. This explains why shutting down a dumps server caused endlessly rising load on the kubernetes cluster in the past. The symlinks cannot be forgotten by a running pod.

My comment in T247455#6086002 would seem to support #3 as the option to go with. That will require a bit more work than it first appears.
The steps are:

Update maintain-kubeusers to modify new PSPs to allow mounting of the /mnt/nfs/dumps-labstore100?.wikimedia.org dirs in the pod
Update the pod-preset for new tools in maintain-kubeusers to include those volumes
Backfill all existing PSPs and pod-presets (one each per tool) to use that new volume as well with a script or something.
Restart any webservices that need the dumps mounts (because the pod-preset change will change the pods regardless of the deployment AFAIK)

This may fix the last mystery about dumps failover, honestly.

Change 592747 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] tests: fix cassette generation and add some testing

https://gerrit.wikimedia.org/r/592747

gerritbot added a project: Patch-For-Review.Apr 27 2020, 7:24 PM

Change 592747 merged by Bstorm:
[labs/tools/maintain-kubeusers@master] tests: fix cassette generation and add some testing

https://gerrit.wikimedia.org/r/592747

Maintenance_bot removed a project: Patch-For-Review.Apr 27 2020, 9:11 PM

Ok, so I manually patched a PSP in toolsbeta to allow the hostfile path /mnt/nfs for the test tool as a read-only volume. Then I manually patched the vols and mounts in the deployment:

toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl exec -it test-86457849c7-h8ckj -- ls /public/dumps/public
10wikipedia		      mhwiki
404.html		      mhwiktionary
aawiki			      minwiki
aawikibooks		      minwiktionary
aawiktionary		      mirrors.html
abwiki			      miwiki
abwiktionary		      miwikibooks
acewiki			      miwiktionary
advisorywiki		      mkwiki
adywiki			      mkwikibooks
afwiki			      mkwikimedia
afwikibooks		      mkwikisource
afwikiquote		      mkwiktionary
afwiktionary		      mlwiki

So there's a PoC. I think that enabling /mnt/nfs read-only as the path prefix for this purpose only seems ok. Alternatively, we can hardcode the mounts of /mnt/nfs/dumps-labstore1006.wikimedia.org and /mnt/nfs/dumps-labstore1007.wikimedia.org in there, but I dislike the idea that this may lock us into those host names in the PSPs. As is, it'll be hardcoded into the pod-presets. Doing so does not affect the read-write status of the tool project dir or other volume mounts despite the symlinking because the container is pretty specific about how it mounts each vol.

Rebuilding the cassettes for the patch before I put it up for review. The change to maintain-kubeusers will only affect new users, but it is a good place to discuss the change.

Changing everyone's pod-preset and psp is more of a shell script kinda deal, I think.

Change 592786 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] k8s-dumps: make the symlinks for dumps NFS work inside toolforge k8s

https://gerrit.wikimedia.org/r/592786

gerritbot added a project: Patch-For-Review.Apr 27 2020, 10:54 PM

Going to test patching every psp and preset in toolsbeta that has certain labels to see if that works instead of scripting. It might be smoother and easier to replicate in the future.

So, patching psps works great as long as you either use fancy json patching or you replace the whole array (the latter option is fine). You cannot update a pod preset, apparently, in v1.15 come Hell or high water. It nicely replies with 200s, but the object will never change no matter if you use patch or kubectl apply -f. I suspect that I'm going to have to delete and recreate each one.

Even kubectl replace has no effect on a podpreset in this version. That's maddening. Delete and create/apply does work.

Change 592786 merged by jenkins-bot:
[labs/tools/maintain-kubeusers@master] k8s-dumps: make the symlinks for dumps NFS work inside toolforge k8s

https://gerrit.wikimedia.org/r/592786

Mentioned in SAL (#wikimedia-cloud) [2020-04-28T22:58:04Z] <bstorm_> rebuilding docker-registry.tools.wmflabs.org/maintain-kubeusers:beta T247455

Maintenance_bot removed a project: Patch-For-Review.Apr 28 2020, 11:11 PM

Mentioned in SAL (#wikimedia-cloud) [2020-04-29T16:52:14Z] <bstorm_> tagged docker-registry.tools.wmflabs.org/maintain-kubeusers:beta to latest to deploy to toolforge T247455

Mentioned in SAL (#wikimedia-cloud) [2020-04-29T16:54:25Z] <bstorm_> deleted the maintain-kubeusers pod to start running the new image T247455

Ok, now I just need to apply the update to existing tools.

Ok, I think this should do it

P11086 rewrite-psp-preset.sh

1	#!/bin/bash
2	# Run this script with your root/cluster admin account as appropriate.
3	# This will fix the dumps mounts for all existing tools.
4
5	set -Eeuo pipefail
6
7	function check-ns(){
8	ns=$1
9	preset=$(kubectl -n "$ns" get podpresets mount-toolforge-vols -o yaml)
10	if [[ $preset =~ ^./mnt/nfs/.$ ]]
11	then
12	return 1
13	else
14	return 0
15	fi
16	}
17
18	declare -a namespaces
19	readarray -t namespaces < <(kubectl get ns -l tenancy=tool --no-headers=true -o custom-columns=:metadata.name)
20
21	for ns in "${namespaces[@]}"
22	do
23	echo "Starting for $ns"
24	if check-ns "$ns"; then
25	echo "Deleting preset for $ns"
26	kubectl -n "$ns" delete podpresets mount-toolforge-vols
27	cat <<EOF \| kubectl apply -f -
28	apiVersion: settings.k8s.io/v1alpha1
29	kind: PodPreset
30	metadata:
31	name: mount-toolforge-vols
32	namespace: $ns
33	spec:
34	env:
35	- name: HOME
36	value: /data/project/${ns:5}
37	selector:
38	matchLabels:
39	toolforge: tool
40	volumeMounts:
41	- mountPath: /public/dumps
42	name: dumps
43	readOnly: true
44	- mountPath: /mnt/nfs/dumps-labstore1007.wikimedia.org
45	name: dumpsrc1
46	readOnly: true
47	- mountPath: /mnt/nfs/dumps-labstore1006.wikimedia.org
48	name: dumpsrc2
49	readOnly: true
50	- mountPath: /data/project
51	name: home
52	- mountPath: /etc/wmcs-project
53	name: wmcs-project
54	readOnly: true
55	- mountPath: /data/scratch
56	name: scratch
57	- mountPath: /etc/ldap.conf
58	name: etcldap-conf
59	readOnly: true
60	- mountPath: /etc/ldap.yaml
61	name: etcldap-yaml
62	readOnly: true
63	- mountPath: /etc/novaobserver.yaml
64	name: etcnovaobserver-yaml
65	readOnly: true
66	- mountPath: /var/lib/sss/pipes
67	name: sssd-pipes
68	volumes:
69	- hostPath:
70	path: /public/dumps
71	type: Directory
72	name: dumps
73	- hostPath:
74	path: /mnt/nfs/dumps-labstore1007.wikimedia.org
75	type: Directory
76	name: dumpsrc1
77	- hostPath:
78	path: /mnt/nfs/dumps-labstore1006.wikimedia.org
79	type: Directory
80	name: dumpsrc2
81	- hostPath:
82	path: /data/project
83	type: Directory
84	name: home
85	- hostPath:
86	path: /etc/wmcs-project
87	type: File
88	name: wmcs-project
89	- hostPath:
90	path: /data/scratch
91	type: Directory
92	name: scratch
93	- hostPath:
94	path: /etc/ldap.conf
95	type: File
96	name: etcldap-conf
97	- hostPath:
98	path: /etc/ldap.yaml
99	type: File
100	name: etcldap-yaml
101	- hostPath:
102	path: /etc/novaobserver.yaml
103	type: File
104	name: etcnovaobserver-yaml
105	- hostPath:
106	path: /var/lib/sss/pipes
107	type: Directory
108	name: sssd-pipes
109	EOF
110	echo "created new preset for $ns"
111	else
112	echo "skipping $ns preset -- already updated"
113	fi
114	kubectl patch psp "${ns}-psp" --patch '{"spec":{"allowedHostPaths":[{"pathPrefix":"/var/lib/sss/pipes"},{"pathPrefix":"/data/project"},{"pathPrefix":"/data/scratch"},{"pathPrefix":"/public/dumps","readOnly":true},{"pathPrefix":"/mnt/nfs","readOnly":true},{"pathPrefix":"/etc/wmcs-project","readOnly":true},{"pathPrefix":"/etc/ldap.yaml","readOnly":true},{"pathPrefix":"/etc/novaobserver.yaml","readOnly":true},{"pathPrefix":"/etc/ldap.conf","readOnly":true}]}}'
115	echo "Finished $ns"
116	done
117
118	echo "*********************"
119	echo "Done!"

I'll run this in toolsbeta to find out.

Oops, forgot to sanitize the output from get ns.

Ok, after a couple more edits, that paste actually seems to do the correct thing 😅

Running a final test on a webservice now that it has run across toolsbeta.

Mentioned in SAL (#wikimedia-cloud) [2020-04-29T19:48:32Z] <bstorm_> ran the scary rewrite-psp-preset.sh script across toolsbeta T247455

Mentioned in SAL (#wikimedia-cloud) [2020-04-29T21:28:51Z] <bstorm_> running the rewrite-psp-preset.sh script across all tools T247455

Mentioned in SAL (#wikimedia-cloud) [2020-04-29T22:13:19Z] <bstorm_> running a fixup script after fixing a bug T247455

Fixed the error in the paste that caused the rework (for future reference).

Ok, at this point, you should be able to interact with dumps NFS in Toolforge Kubernetes in any restarted or newly started service.

In T247455#6095503, @Bstorm wrote:

Ok, at this point, you should be able to interact with dumps NFS in Toolforge Kubernetes in any restarted or newly started service.

I have restarted my tool in a jdk11 container and now the dumps are accessible again.

Thanks a lot!!

• Bstorm closed this task as Resolved.Apr 30 2020, 5:22 PM

• Bstorm claimed this task.

• Bstorm mentioned this in T127559: Soft mount option via hiera for NFS.Jun 11 2020, 10:29 PM

Dumps not accessible from container podsClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

Dumps not accessible from container pods
Closed, ResolvedPublic
Actions