Page MenuHomePhabricator

Exception TypeError when trying to delete a file from media backups
Closed, ResolvedPublicBUG REPORT

Description

[2022-11-25 08:28:59,418] INFO:deletion Attempting to delete "commonswiki/38f/38f8f08e7cb41582e28db17fbcc6662eb51bdff9efc071c8c70e82662ab1e86a" from "https://backup1004.eqiad.wmnet:9000"
Traceback (most recent call last):
  File "/usr/bin/delete-media-file", line 33, in <module>
    sys.exit(load_entry_point('mediabackups==0.1.3', 'console_scripts', 'delete-media-file')())
  File "/usr/lib/python3/dist-packages/mediabackups/cli/delete_media_file.py", line 50, in main
    physically_deleted_files = iq.delete_files(file_list)
  File "/usr/lib/python3/dist-packages/mediabackups/InteractiveQuery.py", line 321, in delete_files
    if s3api.delete_file(backup_shard, backup_name) == -1:
  File "/usr/lib/python3/dist-packages/mediabackups/S3.py", line 108, in delete_file
    client.delete_object(self.bucket, name)
  File "/usr/lib/python3/dist-packages/botocore/client.py", line 354, in _api_call
    raise TypeError(
TypeError: delete_object() only accepts keyword arguments.

Event Timeline

jcrespo triaged this task as Unbreak Now! priority.

This is blocking deletion of files from backups.

This is something that bit other people, so maybe it was working before-even if deprecated, but the library upgrade made it fail:

https://stackoverflow.com/questions/63087362/aws-boto3-delete-objects-failing-with-typeerror-delete-objects-only-accepts-k

The fix looks trivial- just an explicitly named parameter.

Change 860831 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/software/mediabackups@master] Fix parameter naming on deletion of an S3 object

https://gerrit.wikimedia.org/r/860831

Change 860832 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/software/mediabackups@master] Prepare for release 0.1.4

https://gerrit.wikimedia.org/r/860832

The fix worked, deletion was sent this time, but we have an issue with permissions:

...
[2022-11-25 09:06:28,720] INFO:deletion Attempting to delete "commonswiki/38f/38f8f08e7cb41582e28db17fbcc6662eb51bdff9efc071c8c70e82662ab1e86a" from "https://backup1004.eqiad.wmnet:9000"
[2022-11-25 09:06:28,743] ERROR:root An error occurred (AccessDenied) when calling the DeleteObject operation: Access Denied.
[2022-11-25 09:06:28,743] ERROR:deletion "commonswiki/38f/38f8f08e7cb41582e28db17fbcc6662eb51bdff9efc071c8c70e82662ab1e86a" failed to be deleted from "https://backup1004.eqiad.wmnet:9000"
...

I will deploy the software update and have a look at permissions, most likely the user has upload, list and download permissions, but not deletion (or it is using the wrong user, the backup user rather than a more privileged account).

Change 860831 merged by Jcrespo:

[operations/software/mediabackups@master] Fix parameter naming on deletion of an S3 object

https://gerrit.wikimedia.org/r/860831

Change 860832 merged by Jcrespo:

[operations/software/mediabackups@master] Prepare for release 0.1.4

https://gerrit.wikimedia.org/r/860832

I know what the issue is- this automation was made initially just to list and restore files, never to modify them (deletion was a later addition). For this reason, these scripts use the read-only mediabackups account, which fails at performing the actual deletion.

For now, I will add the deletion permission to the read-only user to unblock this, but we will need a better model (e.g. 3 users- one for backups/uploads, one for restores/listing and one for deletion?).

Change 860838 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackups: Add new policy intended for admin deletion of files

https://gerrit.wikimedia.org/r/860838

Issues solved now, with a last blocker: metadata is correctly updated on the files table (file is set as "hard-deleted") but the row on the backups table is not deleted, so this is detected as a "metadata deletion failure". Investigating.

Change 860850 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/software/mediabackups@master] Check 1 row was affected after metadata deletion

https://gerrit.wikimedia.org/r/860850

Change 860850 abandoned by Jcrespo:

[operations/software/mediabackups@master] Check 1 row was affected after metadata deletion

Reason:

This is not the issue, the rows detection is right, what it is failing is the deletion itself

https://gerrit.wikimedia.org/r/860850

Change 860853 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/software/mediabackups@master] deletion: Fix bug in query for metadata deletion

https://gerrit.wikimedia.org/r/860853

Change 860853 merged by Jcrespo:

[operations/software/mediabackups@master] deletion: Fix bug in query for metadata deletion

https://gerrit.wikimedia.org/r/860853

Last minor issue- there was outdated indexes on the codfw database, leading to full scans on deletion.

All issues fixed in release 0.1.5. This is no longer a blocker.

Only pending issues is rethinking the security model for backups/restores/deletions, but otherwise this is solved.

Change 860838 merged by Jcrespo:

[operations/puppet@production] mediabackups: Add new policy intended for admin deletion of files

https://gerrit.wikimedia.org/r/860838