In the process of assembling our privacy statement for this work, we decided to anonymize the data collected after one year. This is a timeline that should be long enough for any programs related to being a new editor to unfold, and also makes sure we don't keep survey data indefinitely. If our feature evolves to be more like "building a user profile", or a situation where users can change or erase their responses themselves, we can revisit this.
Here are the rules we want to implement:
- First, a year after the welcome survey responses are given, they are archived in aggregate in a monthly summary table by @nettrom_WMF that does not keep user IDs or exact timestamps. This will facilitate longitudinal analysis on how newcomer goals shift over time.
- Then, having been archived in aggregate, the responses are deleted from the database tables. At this point, any features that rely on welcome survey responses will be (hopefully, automatically) presented to the user with the default content that a user gets if they have not answered the welcome survey.
- Implementation: old welcomesurvey-responses user option rows are deleted. It is up to whatever code needs welcome survey data to handle the case when such data does not exist for a given user. The deletion is done by a bi-monthly cron job running on the 1st and 15th, deleting data older than 11 months.
The maintenance script is merged and tested on cswiki and kowiki. The cronjob patches still need to be merged.