Page MenuHomePhabricator

WMDE Banner Campaigns Dashboard
Open, Needs TriagePublic

Description

Develop the WMDE Banner Campaigns Dashboard that will unify and standardize all banner campaign reporting for the New Editors team:

  • standardize campaign archiving (this is almost completed);
  • "manually" enter all previous banner campaigns to the archive;
  • automate user revision datasets acquisition;
  • unify (a) campaign reporting with the (b) regular user edits reporting (T171420).

Event Timeline

All 2018 Banner Campaigns are now archived:

  • database: tools.db.svc.eqiad.wmflabs u16664__wdcm_p
  • tables:
mysql> SHOW TABLES LIKE '%campaign%';
+---------------------------------------+
| Tables_in_u16664__wdcm_p (%campaign%) |
+---------------------------------------+
| campaign_banner_impressions           |
| campaign_pageviews                    |
| campaign_registrations                |
| campaign_trainingmodules              |
+---------------------------------------+
4 rows in set (0.00 sec)

Dashboard update engine was completed recently:

  • now testing, and then
  • syncing with a CloudVPS instance that will host the dashboard.

@Stefan_Schneider_WMDE Always. In the meantime, I had to get back to the dashboard engine and introduce additional optimizations and checks to ensure the maximal possible consistency of our data. However... once done properly, it should serve as well in the future, I guess.

  • The dashboard is now operational and ready for testing: http://wdcm.wmflabs.org/WMDE_NewEditorsDashboard/
  • All comments, criticisms, suggestions for new features and analytical outputs, should be made here;
  • The dashboard integrates the once upon a time regularly updated New Editors Monthly Report with the WMDE Banner Campaign analytics;
  • All updates that could have been made daily, will be run daily indeed, as soon as we are finished with testing;
  • The update engine runs from stat1007 while the Dashboard itself is served from a CloudVPS instance wikidataconcepts;
  • The current, test version relies on the 2018 Banner Campaigns only; everything else will be added to the dashboard's back-end soon.

@Stefan_Schneider_WMDE @Verena @kai.nissen @RazShuty Your comments, please. Thank you.

@Addshore @Tobi_WMDE_SW @Jan_Dittrich Your comments on this dashboard would be appreciated. Many thanks!

@GoranSMilovanovic Thx for providing the dashboard. I will reply as soon I checked it.

@GoranSMilovanovic I've had a look on it. Looks great and this is really good for us: having all information available at any time :)

I have a few ideas to imrpove it further:

New Editors Overview

  • Time to reach the 10th edit: I guess the following two data-lines are compared: (1) time reaching the 10th edit from beginning of time. (2) time reaching the 10th edit from beginning of our campaigns. For me it would be interesting to see, if the time reaching an edit changes if we are looking on the same timeframes? What do you say? Maybe
  • Time to reach the 10th edit: For me it would be cool to have the same data on the horizontal axe. For example both have a scale 1-200 days binned in intervals by 5 days. This point I would change in every category

Campaigns

  • Is it possible to change the short cuts of the campaign tags into the descriptive names? I guess that for new persons to share the information with would not get the information right away. So my suggestion would be: WMDE_2018_sprbt1 --> SpBC banner 01; WMDE_neweditors_autumn_lp1 --> AuBC first landing page; WMDE_neweditors_autumn_lpm --> AuBC first landingpage mobile; WMDE_neweditors_autumn_lpn --> AuBC Newsletter, and so on...
  • Could you also add a user distribution table at the end of every campaign? One table for right after the campaign, and another 30 days after the campaign.

Formate:

editsusers
0-1x
2-4x
5-9x
10-50x
>50x

Would it be possible to have such table also in the Campaign Overview containing data of all campaigns? That was an idea that just came and would be awesome to have.

This is from my side. @Christine_Domgoergen_WMDE, @Verena Please check it for yourself and add other suggestions for improvement.

Well done @GoranSMilovanovic :)

@Stefan_Schneider_WMDE @Christine_Domgoergen_WMDE @Verena

@Stefan_Schneider_WMDE Thank you for your feedback. I would like to collect all feedback on this dashboard before introducing any changes. Thus I will be able to plan the dashboard re-design appropriately. Thanks.

Hi @GoranSMilovanovic ,
thank you for making the dashboard, this is really helpful for us! Here is my feedback.

New Editors Overview
Reaching the 10th edit:
This is a huge chart :-) It is very helpful, that you split the data between "normal" users and users who registered via one campaign. Some ideas I have are:

  • Would it be possible to filter the data by year/month? Ideally it should be possible to answer questions like this: How many users reached the 10th edit in the last month/3 months/year etc.?
  • Is it possible to show the absolute numbers and also the average for a certain period of time in a table below the chart (or switch to table view, similar to the table view in the new Wikistats)? I think this is kind of the same idea Stefan mentioned for the campaigns.
  • Is it possible to compare the numbers with the numbers of registrations? Example question: how many of the users who created a user account in 2018/via the spring 2018 campaign/etc. reached the 10th edit (in absolute numbers and in percent)?

Reaching the 50th edit

  • Here the same ideas as for the chart above apply.

Time to reach the 10th and 50th edit

  • This is a great comparison, the table with the compared numbers is very helpful. One question: the data includes only the number of editors, who reach the 10th or the 50th edit, right? Can you also show how many users do not reach the 10th edit at all? Maybe in a different graph or in an extra column in the table?

Edits since January 2017

  • I guess these are the total number of edits per month?
  • the comparison in percent is great. Maybe we could even visualize this in a chart?

Registrations

  • Here it would also be helpful to be able to filter the data by year and month, have the absolute numbers and be able to compare it to the editors' productivity (see comments to chart 1 and 3).

Campaigns

  • a table additionally to all charts with absolute numbers would be very helpful (see Stefan's comment)
  • would it also be possible to additionally compare different campaigns in one chart and/or table?

Reaching the 10th/50th edit

  • is it possbile to change the timeframe automatically according to the selected campaigns? So for example the chart for the spring campaign starts in the spring etc.?
  • the color in the second chart ist orange for no campaigns - this might be confusing, could it be also blue as in the other charts?

Time to reach the 10th edit

  • great that the training modules are included. Could you add the number of editors who didn't start or finish a module for comparison?

Edits since January 2017 per Campaign Banner

  • is it possible to include a key, so we know which banner it was? We can send you the text if you can make some space for it, maybe in Campaign Codes?

Campaign Archive

  • can you include the sum of the views, impressions etc. in a table below the charts?

Thank you!

There has already been extensive feedback. Not that much to add.

I just wonder about the data source and data itself. I wanted to use the numbers for the current quarterly report and compared with previous numbers I reported.

For example: At the end of September 2018 we had 53 people reaching their 10th edit. According to the dashboard now there are only 39? Is there an explanation for that?

However I will definitely need that number for all of 2018.

@Verena

For example: At the end of September 2018 we had 53 people reaching their 10th edit. According to the dashboard now there are only 39? Is there an explanation for that?

  1. not all our campaigns are available on the dashboard;
  2. the dashboard is still not running on a regular update schedule;
  3. and finally, this dashboard will definitely have to wait until all WDCM/Wikidata re-engineering tasks are completed (see my Phab board).

However, if you need any campaign data for your reports soon, please open a separate Phab ticket, I will make it a one shoot analysis and deliver it as a Report to you. But please: warn me timely if you need something like that, because I am currently under a heavy workload with the WDCM/Wikidata things. Thanks.

@Verena Please checkout T212701#4854876.

As of the Dashboard: I will review all your requests once again and start the implementation as soon as I manage to put in place the mess created by a thorough re-engineering of our WDCM Wikidata usage statistics system. It should not take me too long for that, so you can expect the Dashboard to include the new features in the following two weeks or so. It will be then placed on a regular daily update schedule. Thanks.

Suggestions by @Stefan_Schneider_WMDE

Time to reach the 10th edit: I guess the following two data-lines are compared: (1) time reaching the 10th edit from beginning of time. (2) time reaching the 10th edit from beginning of our campaigns. For me it would be interesting to see, if the time reaching an edit changes if we are looking on the same timeframes? What do you say? Maybe

Status: IMPLEMENTED.

Time to reach the 10th edit: For me it would be cool to have the same data on the horizontal axe. For example both have a scale 1-200 days binned in intervals by 5 days. This point I would change in every category

Status: IMPLEMENTED.

Is it possible to change the short cuts of the campaign tags into the descriptive names? I guess that for new persons to share the information with would not get the information right away. So my suggestion would be: WMDE_2018_sprbt1 --> SpBC banner 01; WMDE_neweditors_autumn_lp1 --> AuBC first landing page; WMDE_neweditors_autumn_lpm --> AuBC first landingpage mobile; WMDE_neweditors_autumn_lpn --> AuBC Newsletter, and so on...

Status: NOT IMPLEMENTED.
Ratio: I do not want to be the one who decides upon any descriptive names here. While I agree with you that descriptive names would do us better than tags, the tags are the data descriptors provided to me and not the descriptive names. My suggestion is for you to create a table of all campaign tags used thus far, make a new column called Description or something similar, fill in the descriptive names that you would like to use, and share the table with me. Than I can map the existing tags onto descriptive ones.

Could you also add a user distribution table at the end of every campaign? One table for right after the campaign, and another 30 days after the campaign.
Formate:
edits users
0-1 x
2-4 x
5-9 x
10-50 x
50 x

Status: IMPLEMENTED. Go the the Campaigns tab in the dashboard, and find the section: Edits classes for the selected campaign(s). The way it works: as everything else in this tab, if you select only one campaign, you get the data for that campaign only, if you select more than one campaign, you get the aggregated data for all selected campaigns.

Would it be possible to have such table also in the Campaign Overview containing data of all campaigns? That was an idea that just came and would be awesome to have.

Status: ALREADY THERE. Go to the New Editors Overview tab and look below the Edits since January 2017 section. I have added a title (Edit Classes) before these tables to make them easier to spot.

One table for right after the campaign, and another 30 days after the campaign.

Status: NOT IMPLEMENTED, BUT LET'S HOPE THE FUTURE WILL BE BRIGHTER AND BRING US NEW FEATURES.
Ratio: The dashboard's back-end update engine is already too complicated in terms of timestamp mappings, timezone adjustments and similar operations. At this point, trying to implement the suggested feature would start taking too much time. However, since the dashboard's update engine will need some re-factoring sooner or later, I will see to implement that feature then. Ok?

@Christine_Domgoergen_WMDE I am still working on the implementation of (some) of the features that you have suggested. Thank you for your patience.

@Christine_Domgoergen_WMDE

Reaching the 10th edit:
Would it be possible to filter the data by year/month? Ideally it should be possible to answer questions like this: How many users reached the 10th edit in the last month/3 months/year etc.?

Status: IMPLEMENTED. I have replaced all static {ggplot2} charts with {dygraph} time series which enable you to select the time interval to focus on. Because the campaign vs. no campaign data differ in scale, I have included an optional checkbox to switch to the logarithmic scaling in each chart. My advise would be to always use the log scale.

Is it possible to show the absolute numbers and also the average for a certain period of time in a table below the chart (or switch to table view, similar to the table view in the new Wikistats)? I think this is kind of the same idea Stefan mentioned for the campaigns.

Status: ALREADY IMPLEMENTED. There is a Data (csv) button that initiates a .csv file download. The full dataset is made available. Any additional statistics can be easily computed in MS Excel, Libre Calc or any other spreadsheet software (e.g. Google Sheets) from the dataset. N.B. There are many datasets that can be downloaded from the dashboard for further inspection, calculation of averages, medians per time period etc. Please experiment. Wherever you see a Data (csv) button - a dataset is available.

Is it possible to compare the numbers with the numbers of registrations? Example question: how many of the users who created a user account in 2018/via the spring 2018 campaign/etc. reached the 10th edit (in absolute numbers and in percent)?

Status: ALREADY IMPLEMENTED. (1) Go to the Campaigns tab. (2) Select 2018_SpBC from the drop-down menu on the top of the page. (3). Click Update Analysis. The dashboard will generate all the data that you need to answer your question and visualize them. All the datasets (registrations, user edits) can be downloaded and compared.

the color in the second chart ist orange for no campaigns - this might be confusing, could it be also blue as in the other charts?

Status: COLOR CHANGED TO GREEN; COLOR: BLUE - NOT IMPLEMENTED IN ORDER TO MAINTAIN CONSISTENCY: in all charts in the dashboard, the color scheme is: Campaign is green, No campaign is blue.

Campaign Archive
can you include the sum of the views, impressions etc. in a table below the charts?

Status: IMPLEMENTING NOW.

@Christine_Domgoergen_WMDE

Campaign Archive
can you include the sum of the views, impressions etc. in a table below the charts?

Status: IMPLEMENTED.

@Stefan_Schneider_WMDE @Verena @Christine_Domgoergen_WMDE @RazShuty

I hope that we can agree that the current selection of features on this dashboard provides for a quite thorough overview of our Banner Campaigns.

In order to answer more specific questions, like:

Edits since January 2017 per Campaign Banner
is it possible to include a key, so we know which banner it was? We can send you the text if you can make some space for it, maybe in Campaign Codes?

or

would it also be possible to additionally compare different campaigns in one chart and/or table?

or

Can you also show how many users do not reach the 10th edit at all? Maybe in a different graph or in an extra column in the table?

I suggest you contact me directly and ask for the data and/or visualizations. The following are the reason why I think that would be the best course of action:

  • first, we will still have campaign specific reports provided as notebooks for every new campaign, and such questions can be answered there;
  • second, trying to develop a dashboard for our campaigns that provides an answer to each and every imaginable research question will lead us straight into an endless dev cycle here;
  • beyond that, the dashboard is already complicated, heavy processing on the client-side, and adding additional features will certainly start to overload the screen with information and graphics.

I still need to get back to the dashboard's update engine, re-factor it to work with the new WMF's MySQL storage, and then put it on a regular daily update. @Stefan_Schneider_WMDE I will try to implement one additional feature that you have asked for (edits 30 days after campaign) in that step, because such data are pre-processed in production (stat1007, I think) and not on the dashboard directly.

  • Update engine re-factored;
  • xml config provided;
  • test run from crontab on stat1007 starts on 19:00 UTC.
  • Strange behavior: beeline calls result in No current connection when run from crontab in production;
  • when I run the R script interactively (SSH to stat1007) no errors are reported;
  • need to inspect this with someone from WMF Analytics;
  • until then, updates will be run manually.
  • Next step: CloudVPS update component; then,
  • Documentation.
  • CloudVPS update component: DONE.
  • Next step: Documentation (+ the crontab problem on stat1007).
  • More campaigns added to the Dashboard;
  • N.B. these new campaigns are not yet available from the Campaign Archive tab (the dashboard will report an error upon the selection of any of them).
  • A table encompassing (a) all campaign codes as used in the dashboard, (b) descriptive, "real" campaign names (e.g. 2018 Summer Banner Campaign), and (c) all banners (event_campaign) used in each campaign is now provided in the Campaign Codes tab.

@Stefan_Schneider_WMDE If you would like to have descriptive banner names used across the dashboard, you can use this table as a starting point, provide a descriptive name for each banner, and send it to me; I would then map it onto the current event_campaign values in the dashboard.

  • Dashboard documentation completed;
  • Documentation tab added to the dashboard.

Hi @GoranSMilovanovic thank you a lot for the updates and your work! We will have a closer look at your comments and the implementation next week and get back to you.

  • The problem with the dashboard update engine described in T209055#5095366 is fixed;
  • test run now, monitoring, status: so far, so good.
  • Test run: failed on a single {dplyr} join operation (operand field missing);
  • debugging.

Test run: failed on a single {dplyr} join operation (operand field missing);

  • seems to be related to the Sys.setlocale("LC_TIME", "C") used in the update engine;
  • also - less probable - could be a problem of {data.table} functions masking some of the {lubridate} functionality;
  • still working to resolve the issue.
  • Update engine fixed;
  • running from crontab on stat1007, UTC 20:00, daily;
  • since the CloudVPS component is already live, the dashboard is now update daily.
  • gerrit repo requested;
  • once we have the repo, this goes into production;
  • some minor polishing will be necessary, as well as
  • some manual pre-processing of older (2017) banner campaigns.

@GoranSMilovanovic Hi Goran, I was just trying to look at the dashboard because we are once again setting on standardizing our metrics and a looking at possibilities to make them easily available and understandable for the team. I get a Bad Gateway though, is the dashboard still online and working?

@Christine_Domgoergen_WMDE @Tobi_WMDE_SW

That dashboard is not online nor updated for some time already. Please get in touch with @Verena who can provide the details.

To put it in a nutshell:

  • that dashboard was very heavy on the back-end ETL/data processing side; also
  • at some point changes in the core MediaWiki databases - upon which that dashboard depends - where made but where also of transitory nature, and we have decided to wait until that situation settles and then re-work the dashboard's ETL back end.

At this point, reviving this project would take a huge amount of time and effort, and I do not think that it is feasible at this point given the time that I have to invest in Wikidata related tasks. Please take into your consideration that any action in relation to reviving this project will have to be carefully planned in advance.

Also,

... because we are once again setting on standardizing our metrics and a looking at possibilities to make them easily available and understandable for the team

this will be very difficult to achieve given the variety of methodological approaches that were attempted in our new editors campaigns since 2017 when I have joined in. I would suggest taking an alternative route and setting up a CloudVPS instance to host all new editors campaign reports in the same place as an archive of past campaigns. Additionally, with all the experience gathered in the previous years, maybe now is the time to think about standardizing the metrics for future, and not for the past, campaigns.