Page MenuHomePhabricator

whitelist multimedia and upload wizard tables
Closed, ResolvedPublic5 Estimated Story Points

Description

instead of purging all the data after 90 days from the multimedia and upload wizard tables, can we simply purge the standard capsule?

I ask because with the new investment in a deeper multimedia team, it is likely these features will get attention and the historical data will be relevant:

MediaViewer (link to the others in the box on the right)
MultimediaViewerVersusPageFilePerformance
MultimediaViewerDuration
MultimediaViewerNetworkPerformance
MultimediaViewerAttribution

As far as I can see, none of these have any sensitive data outside of the standard data capsule that ships with each event. Same applies to upload wizard

UploadWizardStep (link to the others in the box on the right)
UploadWizardFlowEvent
UploadWizardErrorFlowEvent
UploadWizardExceptionFlowEvent
UploadWizardUploadFlowEvent
UploadWizardTutorialActions
UploadWizardUploadActions

Event Timeline

Hi @JKatzWMF!

I've looked into the schemas, and they seem non-sensitive to me, given that we purge the parsed (sanitized) userAgent field from the capsule after 90 days. This needs to be done, because some multimedia schemas contain the field 'country' and some upload schemas contain the field 'username'. Those fields combined with the sanitized userAgent can still associate location or username with a device (if the device is uncommon enough).

So, I will add the following schemas to the white-list, excluding the sanitized userAgent field, which will be purged after 90 days.
https://meta.wikimedia.org/wiki/Schema:MediaViewer
https://meta.wikimedia.org/wiki/Schema:MultimediaViewerVersusPageFilePerformance
https://meta.wikimedia.org/wiki/Schema:MultimediaViewerAttribution
https://meta.wikimedia.org/wiki/Schema:UploadWizardStep
https://meta.wikimedia.org/wiki/Schema:UploadWizardFlowEvent
https://meta.wikimedia.org/wiki/Schema:UploadWizardErrorFlowEvent
https://meta.wikimedia.org/wiki/Schema:UploadWizardExceptionFlowEvent
https://meta.wikimedia.org/wiki/Schema:UploadWizardUploadFlowEvent
https://meta.wikimedia.org/wiki/Schema:MultimediaViewerDuration
https://meta.wikimedia.org/wiki/Schema:MultimediaViewerNetworkPerformance
https://meta.wikimedia.org/wiki/Schema:UploadWizardTutorialActions
https://meta.wikimedia.org/wiki/Schema:UploadWizardUploadActions

Nuria set the point value for this task to 3.Jun 5 2017, 9:18 PM
Nuria changed the point value for this task from 3 to 5.

@JKatzWMF

I added the schema fields that we discussed to the white-list.
You can check it here: https://gerrit.wikimedia.org/r/#/c/298721/
The related changes are in patch set 8
Cheers!

Milimetric triaged this task as Medium priority.Jun 22 2017, 3:09 PM

Will move this task to done, because the editing of the white-list is finished and will be merged in a Gerrit patch belonging to another task: T156933.