Page MenuHomePhabricator

filearchive table not available on labs
Closed, ResolvedPublic

Description

Given the usefulness of having metadata including sha of deleted files, and that it is available on the toolserver it should be exposed on labs.


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=49088

Details

Reference
bz61813

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:05 AM
bzimport added a project: Cloud-VPS.
bzimport set Reference to bz61813.
bzimport added a subscriber: Unknown Object (MLST).

That information is not available to normal users on the project, and therefore requires an okay by Legal to clear. Toolserver had imperfectly sanitized replication, and there were quite a few things available there that never should have been without clearance. :-)

Adding Luis to the bug so that they can opine.

Is https://www.mediawiki.org/wiki/Manual:Filearchive_table the best place to figure out what is actually in the relevant table? And do we want all fields or just some?

I would prefer as much as possible, the only field that should contain information that is sensitive is fa_description

The current toolserver view seems to be everything but fa_description and fa_sha1.

  • fa_description should be left out as it might contain private info
  • fa_sha1 is quite recent (1.21) so I think we just never added it at the Toolserver

mysql> describe filearchive;
+----------------------+--------------------------------------------------------------------------------------------------------+------+-----+---------+-------+

FieldTypeNullKeyDefaultExtra

+----------------------+--------------------------------------------------------------------------------------------------------+------+-----+---------+-------+

fa_idint(11)NO0
fa_namevarbinary(255)NO
fa_archive_namevarbinary(255)YES
fa_storage_groupvarbinary(16)YESNULL
fa_storage_keyvarbinary(64)YES
fa_deleted_userint(11)YESNULL
fa_deleted_timestampvarbinary(14)YES
fa_deleted_reasonblobYESNULL
fa_sizeint(8) unsignedYES0
fa_widthint(5)YES0
fa_heightint(5)YES0
fa_metadatamediumblobYESNULL
fa_bitsint(3)YES0
fa_media_typeenum('UNKNOWN','BITMAP','DRAWING','AUDIO','VIDEO','MULTIMEDIA','OFFICE','TEXT','EXECUTABLE','ARCHIVE')YESNULL
fa_major_mimeenum('unknown','application','audio','image','text','video','message','model','multipart')YESunknown
fa_minor_mimevarbinary(32)YESunknown
fa_userint(5) unsignedYES0
fa_user_textvarbinary(255)YES
fa_timestampvarbinary(14)YES
fa_deletedtinyint(1) unsignedNO0

+----------------------+--------------------------------------------------------------------------------------------------------+------+-----+---------+-------+
20 rows in set (0.00 sec)

I know we've seen crazy things be put in filenames before - is that oversightable? Otherwise, agree that fa_sha1 should not be problematic.

Oversight no longer exists, but pretty much anything can be rev_del'ed if that is what you are referring to. However I have never seen a case of a file name being problematic.

I think it was James who told me that there have been crazy file names in the past, but that may be a fever dream - James?

With regards fa_description: is that normally publicly visible? I.e., would sensitive information in it be rev_del'd as part of normal site moderation/oversight? Because with other sensitive fields, one option is to simply respect revdel and keep it from being propagated.

There is, IMO, a plausible issue with the SHA but I don't know whether it is relevant for legal: its primary use case is (of course) to note files which have been previously uploaded then deleted, but it therefore necessarily allows any third party to determine whether any specific file they have the hash to has been uploaded in the past.

Could this be used by, say, a government agency to find who uploaded some files that they were displeased with?

Can't they already do that by simply uploading the file instead of the SHA?

At best they could tell that some file with the same /name/ existed; the SHA will confirm content. AFAIK, uploading doesn't check against deleted files' SHAs.

I(In reply to Marc A. Pelletier from comment #11)

At best they could tell that some file with the same /name/ existed; the SHA
will confirm content. AFAIK, uploading doesn't check against deleted files'
SHAs.

I may be wrong but I believe it does (and tells you that the same file is uploaded at X and I 'think' that one was deleted before though I'd have to double check that.

  • This bug has been marked as a duplicate of bug 57697 ***

(In reply to Marc A. Pelletier from comment #11)

AFAIK, uploading doesn't check against deleted files' SHAs.

It does. And it tells you the title. From the title, look up the (public) logs and you have that user.