Page MenuHomePhabricator

cannot delete non-empty directory: php-1.29.0-wmf.3 messages on 'scap sync' on mwdebug1002
Closed, ResolvedPublic

Description

I am guessing there are some permission issues here?

addshore@mwdebug1002:~$ scap pull
14:02:22 Copying to mwdebug1002.eqiad.wmnet from deployment.eqiad.wmnet
14:02:22 Started rsync common
cannot delete non-empty directory: php-1.29.0-wmf.4/cache/l10n
cannot delete non-empty directory: php-1.29.0-wmf.4/cache/l10n
cannot delete non-empty directory: php-1.29.0-wmf.4/cache
cannot delete non-empty directory: php-1.29.0-wmf.4/cache
cannot delete non-empty directory: php-1.29.0-wmf.4
cannot delete non-empty directory: php-1.29.0-wmf.3/cache/l10n
cannot delete non-empty directory: php-1.29.0-wmf.3/cache/l10n
cannot delete non-empty directory: php-1.29.0-wmf.3/cache
cannot delete non-empty directory: php-1.29.0-wmf.3/cache
cannot delete non-empty directory: php-1.29.0-wmf.3
14:02:33 Finished rsync common (duration: 00m 10s)

Event Timeline

Looks like these messages still appear today during EU swat!

mmodell claimed this task.

I just saw this again today in EU swat.

these are generally caused by some files owned by l10nupdate with the wrong ownership (not group writable) ... The solution would be to adjust the umask for the l10nupdate process.

thcipriani claimed this task.
thcipriani added subscribers: mmodell, thcipriani.

What's happening here is that we removed the wmf-1.29.0-wmf.{3,4} directory on tin; however rsync --delete does not have the permission to remove files on a remote machine that are owned by a different user (in this case l10nupdate).

To fix this, you can run: scap clean 1.29.0-wmf.3 from /srv/mediawiki-staging on tin which manually removes files from both the deployment hosts and targets.

Mentioned in SAL (#wikimedia-operations) [2017-02-23T14:42:31Z] <addshore> addshore@tin scap clean 1.29.0-wmf.6 && scap clean 1.29.0-wmf.7 (to remove warning on scap pull on mwdebug1002, T157030)

zeljkofilipin subscribed.

This is happening again.

zfilipin@mwdebug1002:~$ scap pull
14:18:30 Copying to mwdebug1002.eqiad.wmnet from tin.eqiad.wmnet
14:18:30 Started rsync common
cannot delete non-empty directory: php-1.31.0-wmf.17/cache/l10n
cannot delete non-empty directory: php-1.31.0-wmf.17/cache/l10n
cannot delete non-empty directory: php-1.31.0-wmf.17/cache
cannot delete non-empty directory: php-1.31.0-wmf.17/cache
cannot delete non-empty directory: php-1.31.0-wmf.17
cannot delete non-empty directory: php-1.31.0-wmf.20/cache/l10n
cannot delete non-empty directory: php-1.31.0-wmf.20/cache/l10n
cannot delete non-empty directory: php-1.31.0-wmf.20/cache
14:18:33 Finished rsync common (duration: 00m 03s)
zeljkofilipin reassigned this task from thcipriani to demon.
zeljkofilipin added a subscriber: demon.

@thcipriani said (in #wikimedia-operations) @demon is the best person to take a look at this. :)

Has nothing to do with scap clean. We've been fighting this same error message for years.

Has nothing to do with scap clean. We've been fighting this same error message for years.

I thought this was solved in scap clean at some point? IIRC the root problem is we have to remove l10n cdb files explicitly before running a scap pull since rsync will refuse to delete files since they are owned by l10nupdate and not mwdeploy.

Sorta. From what I can tell, rsync won't delete a destination directory if it still has files in it (considering the source directory didn't have them). mwdeploy can delete them (and in fact, it's what I use to delete them).

From reading rsync(1), I think we could get away with slapping a --force on the arg list.

Fwiw, this works just fine:

SSH_AUTH_SOCK=/run/keyholder/proxy.sock dsh -F 20 -M -g mediawiki-installation -r ssh -o -oUser=mwdeploy -- rm -rf /srv/mediawiki/php-1.31.0-wmf.{4,3}/

In the old method, we just did ^^^ and never had a "partial" cleanup like we do now.

I think there's two actionables here!

  1. Make sure we delete these directories as part of the first pass on scap clean. I think --force will handle this but I need to test
  2. Really really really f'ing make sure old directories get pruned. The fact that we had 1.29.0-wmf.* branches around is pretty f'ing embarrassing.

Mentioned in SAL (#wikimedia-operations) [2018-02-21T15:44:43Z] <no_justification> pruned old 1.29.x and 1.30.x versions that somehow stuck around. Also 1.31.0-wmf.* cache/ directories for unused branches. T157030

I'm curious if scap clean is the wrong approach. A daily (or heck, weekly even) cron that Does The Right Thing is less likely to be forgotten (I'm the only one who ever runs clean) and it would prevent #2 above ^

Just to create a small test case that demos what's going wrong:

$ mkdir -p rsync1/versions/version1/cache/l10n                                                                                                                           
$ touch rsync1/versions/version1/cache/l10n/en.{cdb,json}                                                                                                                      
$ rsync -avz --delete '--exclude=**/cache/l10n/*.cdb' rsync1/ rsync2/                                                                                                    
sending incremental file list                                                                                                                                            
created directory rsync2                                                                                                                                                 
./                                                                                                                                                                       
versions/                                                                                                                                                                
versions/version1/                                                                                                                                                       
versions/version1/cache/                                                                                                                                                 
versions/version1/cache/l10n/                                                                                                                                            
versions/version1/cache/l10n/en.json                                                                                                                                     
                                                                                                                                                                         
sent 246 bytes  received 87 bytes  666.00 bytes/sec                                                                                                                      
total size is 0  speedup is 0.00                                                                                                                                         
$ touch rsync2/versions/version1/cache/l10n/en.cdb                                                                                                                       
$ rm -rf rsync1/versions/version1                                                                                                                                        
$ rsync -avz --delete '--exclude=**/cache/l10n/*.cdb' rsync1/ rsync2/                                                                                                    
sending incremental file list                                                                                                                                            
deleting versions/version1/cache/l10n/en.json                                                                                                                            
cannot delete non-empty directory: versions/version1/cache/l10n                                                                                                          
cannot delete non-empty directory: versions/version1/cache/l10n                                                                                                          
cannot delete non-empty directory: versions/version1/cache                                                                                                               
cannot delete non-empty directory: versions/version1/cache                                                                                                               
cannot delete non-empty directory: versions/version1                                                                                                                     
                                                                                                                                                                         
sent 95 bytes  received 372 bytes  934.00 bytes/sec                                                                                                                      
total size is 0  speedup is 0.00 

Sorta. From what I can tell, rsync won't delete a destination directory if it still has files in it (considering the source directory didn't have them). mwdeploy can delete them (and in fact, it's what I use to delete them).

From reading rsync(1), I think we could get away with slapping a --force on the arg list.

AFAICT, it'll delete a directory with files in it as long as those files aren't explicitly excluded as in: https://github.com/wikimedia/scap/blob/master/scap/tasks.py#L58-L59

I think all clean/cron needs to do is: rm -rf [version]/cache/l10n, delete files on tin, let rsync take care of the rest.

So we think --delete-excluded will solve this. The patch above adds support to Scap's core for this. I'll follow-up with a change to the clean plugin to make use of it.

Won't be live in production for awhile, but the warnings are basically harmless.

Change 424645 had a related patch set uploaded (by Chad; owner: Chad):
[operations/mediawiki-config@master] scap clean: Use --delete-excluded

https://gerrit.wikimedia.org/r/424645

Change 424645 abandoned by Chad:
scap clean: Use --delete-excluded

Reason:
No, we should absolutely not use --delete-excluded.

https://gerrit.wikimedia.org/r/424645

Just got this while syncing:

13:38:32 Started rsync common
cannot delete non-empty directory: php-1.31.0-wmf.28/cache/l10n
cannot delete non-empty directory: php-1.31.0-wmf.28/cache/l10n
cannot delete non-empty directory: php-1.31.0-wmf.28/cache
cannot delete non-empty directory: php-1.31.0-wmf.28/cache
cannot delete non-empty directory: php-1.31.0-wmf.28

Change 441920 had a related patch set uploaded (by Thcipriani; owner: Thcipriani):
[operations/mediawiki-config@master] Scap clean: remove remote cache directory

https://gerrit.wikimedia.org/r/441920

Change 441920 merged by jenkins-bot:
[operations/mediawiki-config@master] Scap clean: remove remote cache directory

https://gerrit.wikimedia.org/r/441920

thcipriani claimed this task.

I think this problem should be resolved going forward

Mentioned in SAL (#wikimedia-operations) [2021-06-22T11:58:13Z] <Lucas_WMDE> lucaswerkmeister-wmde@mwdebug1001:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache && rmdir /srv/mediawiki/php-1.37.0-wmf.1' # per comments in T157030 and similar tasks

Mentioned in SAL (#wikimedia-operations) [2021-06-28T11:20:36Z] <Lucas_WMDE> lucaswerkmeister-wmde@mw1384:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache && rmdir /srv/mediawiki/php-1.37.0-wmf.1' # per comments in T157030 and similar tasks