Page MenuHomePhabricator

Review UploadWizard funnel (click tracking) data again
Closed, ResolvedPublic


We record some data about actions taken by the user in EventLogging (primarily, we record an event every time the user views a step of the upload process). We should probably look at it again; the last time anyone did was in 2014. In particular, I'd like to know how many of the users beginning an upload process (entering the "funnel") end up completing their uploads.

Old analyses:

I'm writing some queries and will post results here. Perhaps we should email them to the Multimedia list or put them on later.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

If you just want pretty plots then scroll down. I just explain how I made them here.

I wrote a query to reconstruct "upload sessions" from the steps we have logged. The ideal session is "tutorial,file,deeds,details,thanks", where the user goes through the entire workflow, uploads files, and closes the wizard afterwards. If it's shorter, the user gave up before completing. If it's longer, they clicked "Upload more files" at the end and went through the workflow again. Here's the query:


Due to T150016, the tutorial step is not always logged. This makes the query a bit more complicated than it otherwise would be. We also count "broken" sessions where some steps are missing (or in wrong order), which can happen if the user is on unreliable network (or in case of bugs somewhere I guess). If these are much higher than zero then the data from that period is not to be trusted.

Sessions where the user goes through the workflow successfully, then clicks "Upload more files" and does it again (and again and again) are counted as if they just completed the workflow once. Some users do this dozens of times, which we can't explain; some click that button just to immediately close the page afterwards, I guess because they feel obligated to choose between this and the button labelled "Go to wiki home page"? Either way I think doing otherwise would skew the results.

We have reliable data since 2015-04-21 (until today, about a year and a half). There's older data going back about a year further, but due to bugs in the original logging code it's difficult to reconstruct sessions from it (session identifiers were being truncated or not recorded at all, and there's no reliable ordering of events within one session). I tried and the query including that is here:

but the results are noisy so I'm ignoring it.

Raw data for the charts:

. The data shows the number of user sessions each day where the user finished on the given step. If they reached "thanks", the upload was successful; if they reached any other step, this is where they gave up.

chart1.png (850×1 px, 181 KB)

Fun observations:

  • There are more uploads in summer months. I guess snow is not photogenic.
  • The huge spikes are Wiki Loves Monuments contests every September.
  • There's a lot of "broken" sessions during WLM. I don't know why.

Plotting the same data as percentages of total number of sessions (excluding "broken" ones). I also excluded all data from Septembers (WLM is a massive outlier) and added moving averages over 90 days.

chart2.png (850×1 px, 238 KB)

Fun observations:

  • A bit more than 50% of users complete the task and upload files. We don't have similar data for other upload tools, so it's hard to tell if it's good or bad (T77548).
  • But we can definitely tell that this has been steadily increasing. Number of users who quit in the "Details" or "Upload" steps has decreased to match.
  • I'm especially happy about the improvements in the "Details" step. We used to have tons of issues with things breaking horribly there, resulting in users losing their input and obviously not completing uploads. We've spent a lot of time resolving them and it looks like it had a nice effect :D

(If you want to play with the charts, here's the LibreOffice Calc file:


If you read, you probably noticed it claims that nearly 60% of users completed the task, which would mean we regressed quite a bit. I think this is incorrect, and caused by not correctly counting sessions where users go through the workflow more than once and "broken" sessions. If we used the same method to calculate survival rates today (

), we'd find that 103% of users reached the final step in September.

chart_old_method_WRONG.png (540×881 px, 26 KB)