Users want to download records of the "User data" ( T208636 ) that the Foundation keeps about them. We need to make sure these reports can be assembled and downloaded without straining the system or causing undue wait times.
ping Ops
Some things we need to consider:
- How big will the resulting file be?
- Will we store the file somewhere and provide a link or stream it back from the request?
- If we store it, where is that storage? How much space is there? How long will these files be available?
- If we stream, what timeouts should we consider? If the stream fails, can it be restarted or does a new file need to be generated?
- When we have the queries/API requests prepared, we should get DBAs to check them out.
UI Questions
As part of this investigation, we need to answer a few questions relating to the UI (tracked in T208889). I think our preference would be to keep the UI simple and let users download all their data for all time in one go. But if that turns out to be too resource intensive, we might wish to break things up. Some of the UI questions we have, then are:
- One report or multiple? Will we have one download for both Contribution and User Data reports? Or (presumably because file size and/or time is an issue) will users select from among multiple reports—e.g., some subset of Account data, Preferences, My Edits, My Logged Actions...
- Forever or by time slice? If the files are really big, we might give users the option of downloading Contributions only the time period that interests—e.g., one year at a time.