**Background and Intro**
In Wikimedia, [[ https://meta.wikimedia.org/wiki/Campaigns | campaigns ]] are activities run annually by many volunteer and partner-led communities to encourage new and existing users to contribute images, data and information in categories like [[ https://wikimediafoundation.org/wikipedia20/wikimedia-campaigns/#section-1 | earth, science, monuments, art ]] etc. One of the key impacts of campaigns is, it acts as a way to introduce the Wikimedia projects for users who are new to the contributing side of it. Campaigns are considered to be an easy gateway for new users to get acquainted with the process of adding and modifying content across the Wikimedia projects like uploading a document/image to Wikimedia Commons or editing an article on Wikipedia. As new users enter the Wikimedia ecosystem through campaigns, it would be interesting to track the statistics of these users in order to understand user retention and quantify impact of campaigns in this area.
**Introduction to the project**
With inspiration from [[ https://wikiloves.toolforge.org/ | Wiki Loves stats tool ]], the idea is to develop a dashboard that can track and share retention metrics of participants, the idea is to develop a dashboard that can track and share retention metricsespecially newcomers after a particular campaign ends. For this, we need to monitor the contributions of the users across Wikimedia projects after the end of a particular campaign. Initially, our scope will be limited to new users from photo campaigns and understand their retention over various Wikimedia projects after the end of the campaign over regular time intervals: 3, 6, our scope will be limited to new users from photo campaigns and understand their retention over various Wikimedia projects after the end of the campaign over regular time intervaland 12 months.
**Project Stages**
- Stage 1: Prepare dataset for a single campaign to track contributions of participants after the campaign is over over a period of 3, - ETL pipeline and dataset prep: As we get started with the project, we will need plan how we would want to extract, transform and load the data before it can be put into the dashboard. 6,We might initially start exploring for a single campaign/category and scale it up to 3-4 campaigns. 9 and 12 months across all Wikimedia projectThis stage will including developing SQL queries to extract data from MediaWiki MariaDB databases and applying to necessary transformations in Python to arrive at the necessary metrics.
- Stage 2: Scale querying data for multiplPrepare dataset for a single campaign to track contributions of participants after the campaigns (3-4 initially) and years is over over a period of 3, and develop a dashboard to identify new and existing users participating in a given campaign and track contributions in the timeframes as mentioned above.6, This will also be include creating necessary visualizations to present to the data9 and 12 months across all Wikimedia projects.
- Stage 2 - Visualisations and Dashboarding: The prepared dataset needs to be visualized and a web app to be created. We are open about the final framework / library we will be using to deploy, - We are open aboutit depends on how the final framework / library we will be using to deploy,project needs emerge and skills of the selected student. sSome options arere HTML/CSS front-end with Flask/Django, [[ https://streamlit.io/ | Streamlit ]], and [[ https://dash.plotly.com/ | Dash ]].
- Stage 3: Time permitting, we will be enable country-level filtering of data which can help us understand user retention statisticsmetrics in a particular country.
**Mentors**
- @Jayprakash12345 (creating web app and deployment)
- @KCVelaga (campaigns and data)
- //one1-2 more to be confirmed//
**Skills required**
//Must haves//
- SQL and using Python for data analysis (Pandas and Numpy libraries)
- Knowledge of at least one Python visulization library (Matplotlib, Seaborn, Plotly, Bokeh etc.) and be willing to learn others if required.
- Knowledge of HTML/CSS and Flask/Django (basic understanding is fine).
//Preferred//
- Experience with building data-related web applications
- Experience with big data tools such as Spark, Hive
- Basic knowledge of Kubernetes
**Getting started**
- [[ https://en.wikipedia.org/wiki/Wikipedia:Who_writes_Wikipedia%3F | Understanding how Wikipedia works ]]
- [[ https://commons.wikimedia.org/wiki/Commons:Welcome | Understand Wikimedia Commons]]
- Understand what campaigns are and how they work:
- [[ https://meta.wikimedia.org/wiki/Campaigns | Basic introduction ]]
- [[ https://meta.wikimedia.org/wiki/Wiki_Loves_X_campaign | Wiki Loves X ]]
- [[ https://en.wikipedia.org/wiki/Wiki_Loves_Earth | Wiki Loves Earth ]] & [[https://en.wikipedia.org/wiki/Wiki_Loves_Monuments | Wiki Loves Monuments]]
- Get familiar with [[ https://www.mediawiki.org/wiki/Manual:Database_layout | MediaWiki database layout ]]
- Get familiar with [[ https://www.mediawiki.org/wiki/API:Query | MediaWIki Query API ]]
**Micro-tasks**
//tba//