Page MenuHomePhabricator

deployments to analytics1030 failing
Closed, ResolvedPublic2 Estimated Story Points

Event Timeline

*i think* the srv partition on analytics1030 is full and the deployment of today (and the one dan did last week) have failed.

df: /mnt/hdfs: Input/output error
Filesystem                           Size  Used Avail Use% Mounted on
udev                                  32G     0   32G   0% /dev
tmpfs                                6.3G  702M  5.7G  11% /run
/dev/mapper/analytics1030--vg-root    55G   25G   28G  47% /
tmpfs                                 32G     0   32G   0% /dev/shm
tmpfs                                5.0M     0  5.0M   0% /run/lock
tmpfs                                 32G     0   32G   0% /sys/fs/cgroup
/dev/mapper/analytics1030--vg-srv     54G   52G     0 100% /srv   -----> TOO FULL
/dev/sda1                            922M   67M  792M   8% /boot
/dev/mapper/analytics1030--vg-mysql   54G  974M   51G   2% /var/lib/mysql

State of cache on an-coord1001

nuria@an-coord1001:/srv/deployment/analytics/refinery-cache/revs$ ls -la
total 20
drwxr-xr-x  5 analytics-deploy analytics-deploy 4096 Jul 17 21:06 .
drwxr-xr-x  4 analytics-deploy analytics-deploy 4096 Jul 17 21:37 ..
drwxr-xr-x 10 analytics-deploy analytics-deploy 4096 Jul 11 19:52 3296aab3c19c7fb22b1410f89d0da4a2ba8a8bdf
drwxr-xr-x  2 analytics-deploy analytics-deploy 4096 Jul 11 20:20 4e9894c0db04ee39be0f094d0ef11ebbff198834
drwxr-xr-x 10 analytics-deploy analytics-deploy 4096 Jul 17 21:06 4f07755485ce819c9c1026611d077ace26842e56

state of cache on analytics1030:

nuria@analytics1030:/srv/deployment/analytics/refinery-cache/revs$ ls -la
total 24
drwxr-xr-x  6 analytics-deploy analytics-deploy 4096 Jul 17 21:06 .
drwxr-xr-x  4 analytics-deploy analytics-deploy 4096 Jul 11 20:11 ..
drwxr-xr-x 10 analytics-deploy analytics-deploy 4096 Jul 11 19:52 3296aab3c19c7fb22b1410f89d0da4a2ba8a8bdf
drwxr-xr-x 10 analytics-deploy analytics-deploy 4096 Jul  1 20:19 4e9894c0db04ee39be0f094d0ef11ebbff198834
drwxr-xr-x 10 analytics-deploy analytics-deploy 4096 Jul 17 21:06 4f07755485ce819c9c1026611d077ace26842e56
drwxr-xr-x 10 analytics-deploy analytics-deploy 4096 Jul  2 01:26 b8a496b174bfb8965090ceaa3ad0ef48db5eec61

Seems like all targgets for scap should have cache on the same condition but that is not happening (probably failed deployments did not get cleaned up). Also, scap config is set to keep two copies, and there are three on the cache:

nuria@deploy1001:/srv/deployment/analytics/refinery$ more scap/scap.cfg
[global]
git_repo: analytics/refinery
git_deploy_dir: /srv/deployment
git_repo_user: analytics-deploy
ssh_user: analytics-deploy
server_groups: canary, default
canary_dsh_targets: target-canary
dsh_targets: targets
git_submodules: False
git_fat: True
cache_revs: 2

I think we need to remove b8a496b174bfb8965090ceaa3ad0ef48db5eec61 from analytics1030 to be able to deploy but i cannot do it cause i do not have sudo, ping @Ottomata @elukey so either can remove and we can continue with deployment tomorrow

Nuria moved this task from Next Up to Done on the Analytics-Kanban board.
Nuria set the point value for this task to 2.
Milimetric triaged this task as High priority.
Milimetric moved this task from Incoming to Operational Excellence on the Analytics board.