Page MenuHomePhabricator

comprehensive report 2021
Closed, ResolvedPublic

Description

New editors - WMDE deep dive 2 - analytics questions -

Reference Document: New editors - WMDE deep dive 2 - analytics questions

Briefing: 27th October 2021
Delivery of report: asap
Previous Task

General requirements
The report should be delivered in tables and charts in html as in previous reports
The report might be publicly available in the future. For this delivery deadline this is no requirement.
Communication will happen in phabricator
The time span for this report is January 2017 until September 2021

Analytics areas
Organic development of editor numbers
since January 2017: How many actively registered in de-WP? - NOTE: mind the user_self_made field in wmf.mediawiki_history
How many of them edited (since registration until 30th September 2021):
1 edit
2 to 4 edits
5 to 9 edits
10 to 49 edits
50 or more edits
(please indicate if the higher edit classes are included in the lower edit classes)
retention rate: How many newly registered users are active after (active = at least 1 edit)
1 month after registration
6 months after registration
How high is the retention rate of these active users compared to the number of registrations?
How many edits were made by these users in total in the given time period (articles only)?

direct campaigning effects
All previous campaigns should be covered. For the following we need the sum and the individual numbers per campaign
since January 2017: How many actively registered in de-WP?
How many of them edited (since registration until 30th September 2021):
1 edit
2 to 4 edits
5 to 9 edits
10 to 49 edits
50 or more edits
retention rate: How many newly registered users are active after (active = at least 1 edit)
1 month after registration
6 months after registration
How high is the retention rate of these active users compared to the number of registrations?
How many edits were made by these users in total in the given time period (articles only)?

Event Timeline

Verena updated the task description. (Show Details)

@Verena I can pick this one up if you want, but due to WikidataCon 2021 there is no chance I will start working on this before Monday, November 1.

Quick answer! Oh no, you are right :(

Can you give an estimate on the amount of time required?

@Verena

All previous campaigns should be covered.

One to two days, I guess, not more. If everything goes alright. And it typically does.

NOTE: mind the user_self_made field in wmf.mediawiki_history

I guess you mean user_is_created_by_self and event_user_is_created_by_self.

@Verena

  • the Report is delivered via e-mail
  • recommendations on how to ensure the future consistency of this report are also provided via e-mail.

@Verena @Tobi_WMDE_SW

The decision is to productionize this report so to have it updated monthly (i.e. with every new snapshot of the wmf.mediawiki_history).

The task is currently not a priority. The progress will be tracked in this ticket. Here is a roadmap

  • Step 1: rewrite the analytics code for basic indicators only (i.e. those where the inconsistencies can manifest) from R to Pyspark.
  • Step 2: productionize the current analytics code in R via our {WMDEData} library + {renv} and deploy on the Analytics Clients (stat1008, perhaps, stat1005 already suffers from tons of processes than run there)
  • Step 3: re-run the analytics code for all available snapshots of the wmf.mediawiki_history table (that would be six months before now)
  • Step 4: develop a new version of the Comprehensive Report in R markdown and deploy to CloudVPS via Docker/Shinyproxy to (a) be able to automatically update it from the Analytics Client side produced datasets and (b) serve online.
  • Step 2. productionize the current analytics code in R via our {WMDEData} library + {renv} and deploy on the Analytics Clients (stat1008, perhaps, stat1005 already suffers from tons of processes than run there) - DONE.
  • Step 3. re-run the analytics code for all available snapshots of the wmf.mediawiki_history table (that would be six months before now) - ready, RUN this 2021/11/24.

The consistency check is now completed:

Step 3. re-run the analytics code for all available snapshots of the wmf.mediawiki_history table (that would be six months before now) - ready, RUN this 2021/11/24.

The ETL/Analytics code will now be deployed to stat1008 and run with every new snapshot of the wmf.mediawiki_history table.

@Tobi_WMDE_SW I need a hand on a CloudVPS where I intend to serve this report from.

I've set up a proxy for this instance, default nginx port 80, but I am not able to see anything on https://www.wmde-dashboards.wmcloud.org/

@Verena The update engine is tested - works, we have a new snapshot of wmf.mediawiki_history and we also have a new version of the Comprehensive Report.

Serving the report from the wmde-dashboards.wmcloud.org CloudVPS instance should not be a problem - please just give some time to @Tobi_WMDE_SW and me to see what to do about T294485#7548497. Thank you.

@Tobi_WMDE_SW

  • Please discard T294485#7548498
  • I have created a new CloudVPS from the resources assigned to the existing virtual instance (nothing useful was running there anyways),
  • a new one - wmde-neweditors - will host this project for the WMDE New Editors team.

@Verena @Tobi_WMDE_SW

The New Editors Comprehensive Report is now served from

https://wmde-neweditors.wmcloud.org/app_direct/WMDENE_docs/2021_WMDE_CampaignsReview.nb.html

Note: The URL will change soon (I will remove "2021" and keep only the "WMDE_CampaignsReview.nb.html" part.

ToDo: - develop a small R update script to update the Docker image periodically.

@Verena @Tobi_WMDE_SW

Here it is - please bookmark this:

https://wmde-neweditors.wmcloud.org/app_direct/WMDENE_docs/WMDE_NewEditors_ComprehensiveReport.nb.html

This is how it works:

  • as soon as the new snapshot of wmf.mediawiki_history is available,
  • a Pyspark script runs from stat1008 in the Analytics Cluster, followed by
  • data pre-processing and aggregation in R;
  • the data are made publicly available (sanitised; no user ids, user names etc) from https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/WMDE_NewEds_Comprehensive/;
  • the Report component runs in CloudVPS; every day an R orchestration script will run 22:00 CET
  • and re-build the Docker image of the Report; so as soon as the data are update from stat1008
  • the Report will automatically update to reflect the changes.

Now we will always have fresh data on this. Thank you for your patience.

@Verena There are some missing numbers in the version of the report that you will find online on https://wmde-neweditors.wmcloud.org/app_direct/WMDENE_docs/WMDE_NewEditors_ComprehensiveReport.nb.html; I will take care of it don't worry.

@Verena

From T294485#7548804

There are some missing numbers in the version of the report that you will find online

This is fixed now; the changes should be visible online following the next daily system update.

Thank you!

Am Mo., 6. Dez. 2021 um 15:33 Uhr schrieb GoranSMilovanovic <
no-reply@phabricator.wikimedia.org>:

GoranSMilovanovic added a comment. View Task
https://phabricator.wikimedia.org/T294485

@Verena https://phabricator.wikimedia.org/p/Verena/

From T294485#7548804 https://phabricator.wikimedia.org/T294485#7548804

There are some missing numbers in the version of the report that you will
find online

This is fixed now; the changes should be visible online following the next
daily system update.

*TASK DETAIL*
https://phabricator.wikimedia.org/T294485

*EMAIL PREFERENCES*
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

*To: *GoranSMilovanovic
*Cc: *Tobi_WMDE_SW, GoranSMilovanovic, Verena, Christine_Domgoergen_WMDE,
Merle_von_Wittich_WMDE, Cirdan, kai.nissen

@Verena @Tobi_WMDE_SW

I guess this ticket should have already been resolved?