**Background and Intro**
In Wikimedia, [[ https://meta.wikimedia.org/wiki/Campaigns | campaigns ]] are activities run annually by many volunteer and partner-led communities to encourage new and existing users to contribute images, data and information in categories like [[ https://wikimediafoundation.org/wikipedia20/wikimedia-campaigns/#section-1 | earth, science, monuments, art ]] etc. One of the key impacts of campaigns is, it acts as a way to introduce the Wikimedia projects for users who are new to the contributing side of it. Campaigns are considered to be an easy gateway for new users to get acquainted with the process of adding and modifying content across the Wikimedia projects like uploading a document/image to Wikimedia Commons or editing an article on Wikipedia. As new users enter the Wikimedia ecosystem through campaigns, it would be interesting to track the statistics of these users in order to understand user retention and quantify impact of campaigns in this area.
With inspiration from [[ https://wikiloves.toolforge.org/ | Wiki Loves stats tool ]], the idea is to develop a dashboard that can track and share retention metrics of participants, especially newcomers after a particular campaign ends. For this, we need to monitor the contributions of the users across Wikimedia projects after the end of a particular campaign. Initially, our scope will be limited to new users from photo campaigns and understand their retention over various Wikimedia projects after the end of the campaign over regular time intervals: 3, 6, and 12 months.
**Project Stages**
- Stage 1 - ETL pipeline and dataset prep: As we get started with the project, we will need plan how we would want to extract, transform and load the data before it can be put into the dashboard. We might initially start exploring for a single campaign/category and scale it up to 3-4 campaigns. This stage will including developing SQL queries to extract data from MediaWiki MariaDB databases and applying to necessary transformations in Python to arrive at the necessary metrics. Prepare dataset for a single campaign to track contributions of participants after the campaign is over over a period of 3, 6, 9 and 12 months across all Wikimedia projects.
- Stage 2 - Visualisations and Dashboarding: The prepared dataset needs to be visualized and a web app to be created. We are open about the final framework / library we will be using to deploy, it depends on how the project needs emerge and skills of the selected student. Some options are HTML/CSS front-end with Flask/Django, [[ https://streamlit.io/ | Streamlit ]], and [[ https://dash.plotly.com/ | Dash ]].
- Stage 3: Time permitting, we will be enable country-level filtering of data which can help us understand user retention metrics in a particular country.
**Mentors**
- @Jayprakash12345
- @KCVelaga
- //1-2 more to be confirmed//
**Skills required**
//Must haves//
- SQL and using Python for data analysis (Pandas and Numpy libraries)
- Knowledge of at least one Python visulization library (Matplotlib, Seaborn, Plotly, Bokeh etc.) and be willing to learn others if required.
- Knowledge of HTML/CSS and Flask/Django (basic understanding is fine).
//Preferred//
- Experience with building data-related web applications
- Experience with big data tools such as Spark, Hive
- Basic knowledge of Kubernetes
**Time commitment / Difficulty**
- 350 hours with medium complexity
**Getting started**
- [[ https://en.wikipedia.org/wiki/Wikipedia:Who_writes_Wikipedia%3F | Understanding how Wikipedia works ]]
- [[ https://commons.wikimedia.org/wiki/Commons:Welcome | Understand Wikimedia Commons]]
- Understand what campaigns are and how they work:
- [[ https://meta.wikimedia.org/wiki/Campaigns | Basic introduction ]]
- [[ https://meta.wikimedia.org/wiki/Wiki_Loves_X_campaign | Wiki Loves X ]]
- [[ https://en.wikipedia.org/wiki/Wiki_Loves_Earth | Wiki Loves Earth ]] & [[https://en.wikipedia.org/wiki/Wiki_Loves_Monuments | Wiki Loves Monuments]]
- Get familiar with [[ https://www.mediawiki.org/wiki/Manual:Database_layout | MediaWiki database layout ]]
- Get familiar with [[ https://www.mediawiki.org/wiki/API:Query | MediaWIki Query API ]]
**Micro-tasks**
- Microtask 1: T304974
//For any doubts, you can post them to this ticket or ask on Zulip stream// `#gsoc2022: campaign retention metrics dashboard`