Page MenuHomePhabricator

Cleanup: 2017-07-02 Toolforge data loss for permissive data
Closed, ResolvedPublic

Description

After a few reports of missing data, we believe a user has inadvertently run the command rm -fr * in the wrong context on July 2 which caused data for other users and tools to go missing if their permissions were broad enough to be caught up.

Suspected user interaction probably happened in these windows:

Jul  2 08:12:30 logged in
Jul  2 09:30:00 logged out

Jul  2 16:56:59 logged in
Jul  2 19:16:20 logged out

When looking in /data/project for Tools using sudo find . -perm -o+w -type d we find 146 Tools out of 1766 that show up with applicable perms.

We have a snapshot of the data pre-rm -fr * thankfully, hought we don't have the capacity to keep this for long on our normal backup schedule.

Proposal atm:

  • Restore missing data in one go as well as we can, or ask users to notify us of their missing data and offer to restore within a certain time frame
  • Send a notice email to labs-announce
  • Include some brief best practices in-line with the notice and a link to a more verbose page on wikitech

sudo find . -perm -o+w -type d &> F8626225

time du -shc cat /home/rush/tools-permissive-files` | sort -h &>` F8626323

Event Timeline

Does anyone know how to contact this user?

https://phabricator.wikimedia.org/T169736#3407648

I'm here. It would be great to find a backup of the "ps" subfolder. It didn't change for a couple of years. so I hope it shouldnt be a problem.

Does anyone know how to contact this user?

https://phabricator.wikimedia.org/T169736#3407648

I'm here. It would be great to find a backup of the "ps" subfolder. It didn't change for a couple of years. so I hope it shouldnt be a problem.

what is the Tool name?

Does anyone know how to contact this user?

https://phabricator.wikimedia.org/T169736#3407648

I'm here. It would be great to find a backup of the "ps" subfolder. It didn't change for a couple of years. so I hope it shouldnt be a problem.

what is the Tool name?

h2bot

h2bot

We are still working out what to do here but fyi paths found via sudo find . -perm -o+w -type d are in P5684

Current status: best guess this effected about 128 Tools, and I'm trying to gather how much data we are talking here size wise.

Current status: best guess this effected about 128 Tools, and I'm trying to gather how much data we are talking here size wise.

aswnbot
autolist
bawolff
blogconverter
bookmanagerv2
calakbot
captcha
captcha-dev
checkwiki
cite-o-meter
cluestuff
cobain
cocytus
copypatrol
cp
cropbot
datbot
denkmalliste
detox
dibot
dow
durl-shortener
elobot
emausbot
extreme
fatemi
fatg
fawikiauto
file-reuse
fiwiki-tools
gennfs
geoplotter
gns
gsociftttdev
h2bot
hartman
hawk-eye-bot
hennalabs
hostbot
hroest
ibrahim
icelab
ifttt-testing
incolabot
invadibot
irc-cloaks
ircredirector
itsource
jarry-common
jembot
kian
krdbot
labelimgohs
lbenedix
legobot
lestaty
lrbot
mahdiz
masscamps
mbh
mediawiki-feeds
merlbot
mg-bot
mifterbot
mifterbot-en
musikanimal
nagf
oojs-ui
owintes
pavlochembot
paws
phetools
pirsquared
plagiabot
pltools
project-fa
ptbot
ptwikis
pub
pywikibot
raehhamsang
rc-vikidia
readmore
remarkup2wikitext
revisions-blacklist
reza-dev
rezabot
rillke
rotatebot
rubinbot3
secwatch
selim
shbibbot
shbot
sigma
sit
sparqlblocks
spbot
static-bz
suggestbot
superzerocool
svwiktionary
templatetiger
tool-account
tour
urbanecmbot
viafbot
vltools
wakt
wanderwiki
wd-analyst
weeklypedia
wiki-talk
wiki-talk2
wikicaptcha
wikimetrics
wikisource-tweets
wikiviewstats2
wikiviz
wmve
wp-world
xecfork
xtools
yichengtry
zhwiki-teleirc
zppixbot

At this moment our strategy is probably going to be to restore these files to somewhere user accessible as the size and side effects of trying to do unnecessary or unwelcome restore here are many. This will take awhile as some of this is quite large with a grand total of 1.2T

483G ./shared/tools/project/.shared
is supposed to be okay since it's 3777, unless the user deleted their own files, but for some reason doing the du myself on live /data/project/.shared I get:
16G /data/project/.shared

Mentioned in SAL (#wikimedia-operations) [2017-07-06T19:23:17Z] <chasemp> labstore2003 time bash restore.sh &> /tmp/restore_7_6_2017v1.log for T169774

#!/bin/bash

cd /srv/backup/tools/shared/tools/project
# 566f5ecc3bf0f3bdbd90a609cf966f70  /home/rush/tools-permissive-files
wc -l `cat /home/rush/tools-permissive-files | grep -v '^#'`
for f in `cat /home/rush/tools-permissive-files | grep -v '^#'`; do
    # tool project
    project=`echo $f | cut -d '/' -f 5`
    # relative path from project name
    restore_me=`echo $f | awk -F $project '{print $2}'`

    echo $project
    echo $restore_me

    rsync \
        --timeout=30 --bwlimit=250000 \
        --archive --hard-links --inplace \
        --links --perms --times --relative -r \
        --rsh 'ssh -C -i /root/.ssh/id_labstore' \
        "${project}${restore_me}" \
        root@labstore1003.eqiad.wmnet:/srv/scratch/T169774/
done

This data should be available in the scratch share in a directory named T169774. Permissions, ownership, etc should be preserved however I did execute chmod -R o-w T169774/ to remove w ability for other. I will write up a proper announcement to loop in anyone who hasn't yet noticed they need this service in the morning :)

Demonstration of o+w effect on a directory and file removal by a foreign user:

(as root)# touch foo/bar/vuln/myfile
(as root)# tree
.
└── foo
    └── bar
        └── vuln
            └── myfile

3 directories, 1 file
(as root)# ls -ald foo/bar/vuln
drwxrwxr-x 2 root root 4096 Jul  7 19:50 foo/bar/vuln

(as root)# ls -al foo/bar/vuln/myfile
-rw-r--r-- 1 root root 0 Jul  7 19:50 foo/bar/vuln/myfile

(as root)# ls -ald foo/bar/vuln
drwxrwxr-x 2 root root 4096 Jul  7 19:50 foo/bar/vuln

(as regular user)#rm -f foo/bar/vuln/myfile
rm: cannot remove ‘foo/bar/vuln/myfile’: Permission denied

(as root)# chmod o+w foo/bar/vuln

(as regular user)#rm -f foo/bar/vuln/myfile

(as root)# ls -ald foo/bar/vuln
drwxrwxrwx 2 root root 4096 Jul  7 19:50 foo/bar/vuln
chasemp renamed this task from Toolforge data loss for permissive data July 2 2017 to 2017-07-02 Toolforge data loss for permissive data.Jul 7 2017, 8:34 PM
On 2017-07-02 some users experienced a loss of data in their projects.  

We estimate 126 Tools out of 1,766 saw at least one file removed.  We have reason to believe a user acting under a Tool account issued the command 'rm -fr *' at the wrong point in the directory structure.  Anyone who had files that were removable by this user was effected.  Thankfully, we have a backup from before the command was run, a minority of users have overly permissive files, and a further minority were severely impacted.  https://phabricator.wikimedia.org/T169774 was created in response to inquiries surrounding data loss.  

We do not guarantee any level of user backups for day-to-day operations, but in this case since we do have the data I have restored it to /data/scratch/T169774/ so users can retrieve what was removed.  We intend to make this restored data available until at least 2017-08-08.  A Warning: Please do not rely on NFS for backups of code or critical data.  We only have capacity to keep 2 weeks of historical backups at the moment and cannot guarantee timely retrieval or availability.  Every Tool account can use https://phabricator.wikimedia.org/diffusion/ for code hosting, and creation of the repository is handled by going to https://toolsadmin.wikimedia.org/tools/id/<mytool>.

This calamity was almost entirely caused by directories with o+w set allowing 'other' or 'everyone' write access.  Do not use permissions such as '777' or that look like 'drwxrwxrwx' as it will allow other users to remove your files.  This is especially dangerous in a shared hosting environment as this incident has shown.

A brief explanation of why this happened to users who have given write permissions to 'other' for a directory in their Tool:

Because directories are not used in the same way as regular files, the permissions work slightly (but only slightly) differently.  An attempt to list the files in a directory requires read permission for the directory, but not on the files within.  An attempt to add a file to a directory, delete a file from a directory, or to rename a file, all require write permission for the directory, but (perhaps surprisingly) not for the files within. 

-  Unix File and Directory Permissions and Modes (https://wpollock.com/AUnix1/FilePermissions.htm)

ACTION ITEMS:
- Make sure you have backups of code and data needed
- Check for removed data you want to restore on login.tools.wmflabs.org at /data/scratch/T169774/ 
- Check your Tools files and directories for o+w permission and remove if possible (chmod -R o-w <directory>).
- Ask for help on the labs-l mailing list, Phabricator, or in the #wikimedia-cloud IRC channel if you cannot figure out how to do without o+w (someone may have a different solution).

Mentioned in SAL (#wikimedia-operations) [2017-07-10T13:49:04Z] <chasemp> labstore2003:~# umount -fl /srv/backup/tools (for T169774 recovery)

Note: two larger Tools had data synced after the initial as they are an overwhelming portion of the restore. I have placed these in 'archive' within the T169774 directory.

chasemp claimed this task.

Data is available for user restore, notifications have been sent to the lists, and I haven't seen new user queries in a few days.

chasemp renamed this task from 2017-07-02 Toolforge data loss for permissive data to Cleanup: 2017-07-02 Toolforge data loss for permissive data.Jul 11 2017, 7:32 PM
chasemp reopened this task as Stalled.

marked stalled so we remember to remove the temp restore data

chasemp lowered the priority of this task from High to Lowest.Jul 17 2017, 1:32 PM

marked stalled so we remember to remove the temp restore data

I will remove this data in one week.