Page MenuHomePhabricator

[4 hour spike] Investigate performance issues relating to downloading the User Data report
Closed, ResolvedPublic

Description

Users want to download records of the "User data" ( T208636 ) that the Foundation keeps about them. We need to make sure these reports can be assembled and downloaded without straining the system or causing undue wait times.

ping Ops

Some things we need to consider:

  • How big will the resulting file be?
  • Will we store the file somewhere and provide a link or stream it back from the request?
  • If we store it, where is that storage? How much space is there? How long will these files be available?
  • If we stream, what timeouts should we consider? If the stream fails, can it be restarted or does a new file need to be generated?
  • When we have the queries/API requests prepared, we should get DBAs to check them out.

UI Questions

As part of this investigation, we need to answer a few questions relating to the UI (tracked in T208889). I think our preference would be to keep the UI simple and let users download all their data for all time in one go. But if that turns out to be too resource intensive, we might wish to break things up. Some of the UI questions we have, then are:

  • One report or multiple? Will we have one download for both Contribution and User Data reports? Or (presumably because file size and/or time is an issue) will users select from among multiple reports—e.g., some subset of Account data, Preferences, My Edits, My Logged Actions...
  • Forever or by time slice? If the files are really big, we might give users the option of downloading Contributions only the time period that interests—e.g., one year at a time.

Event Timeline

jmatazzoni renamed this task from Investigate performance issues relating to downloading 'Contribution' and 'User' data reports to Investigate performance issues relating to downloading 'Contribution' and 'User' data reports [4 hour spike].Nov 7 2018, 12:33 AM
jmatazzoni moved this task from Ready to In Development on the Community-Tech-Sprint board.

Noting here that we are no longer going to provide the Contributions report.

jmatazzoni renamed this task from Investigate performance issues relating to downloading 'Contribution' and 'User' data reports [4 hour spike] to Investigate performance issues relating to downloading the User Data report [4 hour spike].Nov 30 2018, 6:41 PM
MBinder_WMF renamed this task from Investigate performance issues relating to downloading the User Data report [4 hour spike] to [4 hour spike] Investigate performance issues relating to downloading the User Data report .Dec 11 2018, 12:48 AM
MBinder_WMF added a project: Spike.
aezell moved this task from In Development to Q2 2018-19 on the Community-Tech-Sprint board.

The upshot of all of this is the code will live in MediaWiki Core.

The file will be sent to the user's browser from memory. This is possible as we removed the Contributions Data which would have been a large amount of data. With that removed, we don't need anything too fancy.