cannot delete non-empty directory: php-1.29.0-wmf.3 messages on 'scap sync' on mwdebug1002
Open, LowPublic

Description

I am guessing there are some permission issues here?

addshore@mwdebug1002:~$ scap pull
14:02:22 Copying to mwdebug1002.eqiad.wmnet from deployment.eqiad.wmnet
14:02:22 Started rsync common
cannot delete non-empty directory: php-1.29.0-wmf.4/cache/l10n
cannot delete non-empty directory: php-1.29.0-wmf.4/cache/l10n
cannot delete non-empty directory: php-1.29.0-wmf.4/cache
cannot delete non-empty directory: php-1.29.0-wmf.4/cache
cannot delete non-empty directory: php-1.29.0-wmf.4
cannot delete non-empty directory: php-1.29.0-wmf.3/cache/l10n
cannot delete non-empty directory: php-1.29.0-wmf.3/cache/l10n
cannot delete non-empty directory: php-1.29.0-wmf.3/cache
cannot delete non-empty directory: php-1.29.0-wmf.3/cache
cannot delete non-empty directory: php-1.29.0-wmf.3
14:02:33 Finished rsync common (duration: 00m 10s)
Addshore created this task.Feb 2 2017, 2:16 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 2 2017, 2:16 PM
Addshore moved this task from Backlog to Watching on the User-Addshore board.

Looks like these messages still appear today during EU swat!

mmodell edited projects, added Scap; removed scap2.Feb 10 2017, 6:22 PM
mmodell closed this task as Resolved.Feb 10 2017, 6:31 PM
mmodell claimed this task.
Addshore reopened this task as Open.Feb 13 2017, 2:23 PM

I just saw this again today in EU swat.

these are generally caused by some files owned by l10nupdate with the wrong ownership (not group writable) ... The solution would be to adjust the umask for the l10nupdate process.

thcipriani closed this task as Resolved.Feb 15 2017, 12:06 AM
thcipriani claimed this task.

What's happening here is that we removed the wmf-1.29.0-wmf.{3,4} directory on tin; however rsync --delete does not have the permission to remove files on a remote machine that are owned by a different user (in this case l10nupdate).

To fix this, you can run: scap clean 1.29.0-wmf.3 from /srv/mediawiki-staging on tin which manually removes files from both the deployment hosts and targets.

Mentioned in SAL (#wikimedia-operations) [2017-02-23T14:42:31Z] <addshore> addshore@tin scap clean 1.29.0-wmf.6 && scap clean 1.29.0-wmf.7 (to remove warning on scap pull on mwdebug1002, T157030)

zeljkofilipin reopened this task as Open.Feb 21 2018, 2:43 PM

This is happening again.

zfilipin@mwdebug1002:~$ scap pull
14:18:30 Copying to mwdebug1002.eqiad.wmnet from tin.eqiad.wmnet
14:18:30 Started rsync common
cannot delete non-empty directory: php-1.31.0-wmf.17/cache/l10n
cannot delete non-empty directory: php-1.31.0-wmf.17/cache/l10n
cannot delete non-empty directory: php-1.31.0-wmf.17/cache
cannot delete non-empty directory: php-1.31.0-wmf.17/cache
cannot delete non-empty directory: php-1.31.0-wmf.17
cannot delete non-empty directory: php-1.31.0-wmf.20/cache/l10n
cannot delete non-empty directory: php-1.31.0-wmf.20/cache/l10n
cannot delete non-empty directory: php-1.31.0-wmf.20/cache
14:18:33 Finished rsync common (duration: 00m 03s)
zeljkofilipin closed this task as Resolved.Feb 21 2018, 2:45 PM
zeljkofilipin reopened this task as Open.Feb 21 2018, 2:48 PM
zeljkofilipin reassigned this task from thcipriani to demon.
zeljkofilipin added a subscriber: demon.

@thcipriani said (in #wikimedia-operations) @demon is the best person to take a look at this. :)

Addshore removed a subscriber: Addshore.Feb 21 2018, 2:51 PM
demon added a comment.Feb 21 2018, 3:26 PM

Has nothing to do with scap clean. We've been fighting this same error message for years.

Has nothing to do with scap clean. We've been fighting this same error message for years.

I thought this was solved in scap clean at some point? IIRC the root problem is we have to remove l10n cdb files explicitly before running a scap pull since rsync will refuse to delete files since they are owned by l10nupdate and not mwdeploy.

demon added a comment.EditedFeb 21 2018, 3:35 PM

Sorta. From what I can tell, rsync won't delete a destination directory if it still has files in it (considering the source directory didn't have them). mwdeploy can delete them (and in fact, it's what I use to delete them).

From reading rsync(1), I think we could get away with slapping a --force on the arg list.

demon added a comment.Feb 21 2018, 3:37 PM

Fwiw, this works just fine:

SSH_AUTH_SOCK=/run/keyholder/proxy.sock dsh -F 20 -M -g mediawiki-installation -r ssh -o -oUser=mwdeploy -- rm -rf /srv/mediawiki/php-1.31.0-wmf.{4,3}/
demon added a comment.Feb 21 2018, 3:37 PM

In the old method, we just did ^^^ and never had a "partial" cleanup like we do now.

demon added a comment.Feb 21 2018, 3:42 PM

I think there's two actionables here!

  1. Make sure we delete these directories as part of the first pass on scap clean. I think --force will handle this but I need to test
  2. Really really really f'ing make sure old directories get pruned. The fact that we had 1.29.0-wmf.* branches around is pretty f'ing embarrassing.

Mentioned in SAL (#wikimedia-operations) [2018-02-21T15:44:43Z] <no_justification> pruned old 1.29.x and 1.30.x versions that somehow stuck around. Also 1.31.0-wmf.* cache/ directories for unused branches. T157030

demon added a comment.Feb 21 2018, 3:46 PM

I'm curious if scap clean is the wrong approach. A daily (or heck, weekly even) cron that Does The Right Thing is less likely to be forgotten (I'm the only one who ever runs clean) and it would prevent #2 above ^

Just to create a small test case that demos what's going wrong:

$ mkdir -p rsync1/versions/version1/cache/l10n                                                                                                                           
$ touch rsync1/versions/version1/cache/l10n/en.{cdb,json}                                                                                                                      
$ rsync -avz --delete '--exclude=**/cache/l10n/*.cdb' rsync1/ rsync2/                                                                                                    
sending incremental file list                                                                                                                                            
created directory rsync2                                                                                                                                                 
./                                                                                                                                                                       
versions/                                                                                                                                                                
versions/version1/                                                                                                                                                       
versions/version1/cache/                                                                                                                                                 
versions/version1/cache/l10n/                                                                                                                                            
versions/version1/cache/l10n/en.json                                                                                                                                     
                                                                                                                                                                         
sent 246 bytes  received 87 bytes  666.00 bytes/sec                                                                                                                      
total size is 0  speedup is 0.00                                                                                                                                         
$ touch rsync2/versions/version1/cache/l10n/en.cdb                                                                                                                       
$ rm -rf rsync1/versions/version1                                                                                                                                        
$ rsync -avz --delete '--exclude=**/cache/l10n/*.cdb' rsync1/ rsync2/                                                                                                    
sending incremental file list                                                                                                                                            
deleting versions/version1/cache/l10n/en.json                                                                                                                            
cannot delete non-empty directory: versions/version1/cache/l10n                                                                                                          
cannot delete non-empty directory: versions/version1/cache/l10n                                                                                                          
cannot delete non-empty directory: versions/version1/cache                                                                                                               
cannot delete non-empty directory: versions/version1/cache                                                                                                               
cannot delete non-empty directory: versions/version1                                                                                                                     
                                                                                                                                                                         
sent 95 bytes  received 372 bytes  934.00 bytes/sec                                                                                                                      
total size is 0  speedup is 0.00 

Sorta. From what I can tell, rsync won't delete a destination directory if it still has files in it (considering the source directory didn't have them). mwdeploy can delete them (and in fact, it's what I use to delete them).

From reading rsync(1), I think we could get away with slapping a --force on the arg list.

AFAICT, it'll delete a directory with files in it as long as those files aren't explicitly excluded as in: https://github.com/wikimedia/scap/blob/master/scap/tasks.py#L58-L59

I think all clean/cron needs to do is: rm -rf [version]/cache/l10n, delete files on tin, let rsync take care of the rest.

So we think --delete-excluded will solve this. The patch above adds support to Scap's core for this. I'll follow-up with a change to the clean plugin to make use of it.

Won't be live in production for awhile, but the warnings are basically harmless.

demon triaged this task as Low priority.Feb 22 2018, 12:44 AM

Change 424645 had a related patch set uploaded (by Chad; owner: Chad):
[operations/mediawiki-config@master] scap clean: Use --delete-excluded

https://gerrit.wikimedia.org/r/424645

Change 424645 abandoned by Chad:
scap clean: Use --delete-excluded

Reason:
No, we should absolutely not use --delete-excluded.

https://gerrit.wikimedia.org/r/424645