Data analysis for the Donations/Membership data sets.
Reference doc: Documentation Export Donation Data
GoranSMilovanovic | |
May 21 2018, 9:13 AM |
F22416590: Fundraising_06172018.nb.html | |
Jun 21 2018, 12:00 AM |
F22380701: CampaignEffect.png | |
Jun 19 2018, 11:47 PM |
F22342123: image.png | |
Jun 18 2018, 3:28 PM |
F22342121: image.png | |
Jun 18 2018, 3:28 PM |
F22331034: Fundraising_06172018.nb.html | |
Jun 17 2018, 11:56 PM |
Data analysis for the Donations/Membership data sets.
Reference doc: Documentation Export Donation Data
@Jan_Dittrich and @GoranSMilovanovic notes:
https://phabricator.wikimedia.org/T194744
@kai.nissen @Jan_Dittrich @Tobi_WMDE_SW
Next steps:
@kai.nissen @Jan_Dittrich @Tobi_WMDE_SW
@kai.nissen @Jan_Dittrich @Tobi_WMDE_SW
@kai.nissen @Jan_Dittrich @Tobi_WMDE_SW
Here's a detailed technical report. Please let me know if you need any additional sections (I didn't bother to visual the differences between group means for t-tests as I find that trivial; if you wish to take a look at the distributions per campaign - just let me know).
So the outcome is: It is likely that there were not changes except for the opt-in.
As a reminder, this is how the old opt in looked like:
and this is the new one:
@Jan_Dittrich Yes, that would be the conclusion.
A question for you:
If this needs a rework in terms of analyzing campaign-wise in the above given sense, we can have it in no time (it's the same R code with one filter removed per analysis).
is it the case that the opt in page was always the same, in every campaign? Your posting of only two different solutions somehow implies that they were always the same?
I think so: There were two opt-in pages (old design, new design) which were used with several different banners – correct @kai.nissen , @tmletzko ?
So, I I get it right, controlled variable would be new design/old design, outcome would be number of opt in/out and the different campagins would be a factor that varies, but is not itself of interest.
Hi @GoranSMilovanovic and @Jan_Dittrich. I've got some comments and questions on the analysis.
Thank you for your comments!
That implies that we need also need a campaign-wise analysis (all the data from all campaigns pooled together) in place of the cross-campaign analysis (what we have now: each campaign analyzed separately). To be delivered this evening.
-" There is one thing about the tracking codes: As I wrote in https://phabricator.wikimedia.org/T194744 in campaign "38-ba-171223" the old landing page has a "var" tracking code. I just want to check if you accounted for that."
I think so, yes - the coding scheme was provided mentioning that exception and used in R directly to recode the data.
Please be a bit more specific about the "correlations in the data set" that you would like to investigate. The exploratory data analysis - in terms of looking at the distributions, cross-tabulations and similar - will be easy to complete.
Again, thank you for your comments and the additional info that you have provided.
@GoranSMilovanovic Ah, I thought the analysis was campaign-wise. Ok, looking forward to see the result.
Quoted Text Please be a bit more specific about the "correlations in the data set" that you would like to investigate.
I would like to ping here @Jan_Dittrich who suggested to do such an analysis. The background was to find clues for understanding why the opt-in rate and the rate of donations with addresses decreased with the new landing page, to further understand user behavior of the landing page use in general and to better understand factors influencing performance / success of a landing page.
@Tobias_Schumann_WMDE @Jan_Dittrich @kai.nissen
Donations Data Set
Campaign-wise analysis (i.e. all the data on donation amounts and opting in analyzed together, no matter the particular campaign from where did they originate):
@Tobias_Schumann_WMDE @Jan_Dittrich @kai.nissen
Membership Data Set, campaign-wise analyses
In other words, the results for both data sets do not differ if we switch from cross-campaign (i.e. every campaign analyzes separately) to campaign-wise (i.e. data from all campaigns are pooled together) analysis. No effect of old vs. new page. The only useful finding is the effect of campaign on the donation amount reported in T195242#4301115.
@Tobias_Schumann_WMDE @Jan_Dittrich @kai.nissen
The Exploratory Data Analysis visualizations are found in the Exploratory Data Analysis sections for the Donations and the Membership data sets respectively:
I have focused on visualizing the distributions - per experimental old vs new factor, as well as per campaign - here. In my judgment, the EDAs are truly illustrative in respect to the results of the previously reported statistical hypothesis testings that have revealed no effects of experimental factors across the campaigns - as well as campaign-wise.
In my judgment, I think you should not focus on quantitative results beyond this point. Of course there are always some additional analyses than could be performed. However, after what I have seen thus far, I really do not think that anything beyond what we already have could provide any additional insight.
You have mentioned some "correlations" in the data sets that could be interesting to see. If you could be a bit more specific about what do you mean by "correlations" in the context of the present analysis, I could probably provide for them also. But I think that by comparing the boxplots - or simply by looking at the distributions of amounts (be it donations, or membership fees) across the campaigns - you will see why we have observed a consistent no effect finding.
I find the following comments of @Tobias_Schumann_WMDE very helpful in terms of guidance in analytical work:
However there were important differences between the campaigns: different banners leading to the landing page (i would say less important to consider) and different webpages / different devices where the banners were displayed. The campaigns were shown on de.wikipedia.org (desktop), en.wikipedia.org (desktop), wikipedia.de (desktop) and de.m.wikipedia (one campaign for all mobile devices except ipad and one campaign for ipad). So the big difference I would say are different devices e.g. screen sizes which are used [...] And thats why we usually analyse each campaign for itself. Finally, the new landing page - which was especially designed for mobile experience - could perform better on mobile but worse on desktop. Still, it is very interesting to see that you couldn't find a difference in perfomance throughout all campaigns.
but you know, as I know, that strict analytical procedures rely on observable data and well defined experimental designs - while the data sets that were provided encompass no information on the type of device. In that respect, having richer data sets in the future could maybe help us to understand the effects of our campaigns - or their failures - better. Having a failed campaign is not necessarily bad until we can learn something from it.
Please let me know if there are any questions that you might have in respect to what was provided thus far, or if you think that any of the reported findings call for additional interpretations.