===Profile Information
**Name:** Akash
**Zulip username:** akashsuper2000
**Web Profile:** https://akashsuper2000.github.io/
**Resume:** https://akashsuper2000.github.io/resume.pdf
**Location:** Chennai, India
**Typical working hours:** IST
**Proposal document:** https://phabricator.wikimedia.org/T306268
===Synopsis
=====Short summary describing your project and how it will benefit Wikimedia projects
**Project document:** https://phabricator.wikimedia.org/T304826
**Proposal document:** https://phabricator.wikimedia.org/T306268
Campaigns are an integral part of the Wikimedia community aimed to encourage new and existing users to contribute data/information to the repository. Therefore, it is essential to understand the impact and the user retention of such campaigns. The goal of this project is to develop a metrics dashboard that provides insights on user retention over different time intervals.
For this, an ETL pipeline should be created that processes data from sources such as MariaDB or data dump, into a graph-feedable format.To achieve this, Informative and attractive graphs are automatically createdan ETL pipeline should be built that imports and processes data from this data and sent to the front-end,e relevant sources into a graph-feedable format. built on frameworks like FlaskInsightful graphs are created from this data which are displayed to the user.
=====Possible Mentors
@Jayprakash12345
@KCVelaga
@Sadads
=====Have you contacted your mentors already?
Yes, I have contacted the mentors through Wikimedia's Zulip chat.
===Deliverables
====Timeline
=====May 20 - June 12
- Community bonding - connect with experts and fellow contributors.
- Refine the proposal by getting it reviewed with the mentor.
- Finalize the following:
-- Web framework based on stability, speed, simplicity, developer friendliness, etc.
-- UI design based on responsiveness, visual appeal, etc.
-- Visualization library and graphs based on usefulness, clarity, ambiguity, etc.
-- Process type (on-demand, cron job, or preset).
-- Access restrictions, integrations, and other minor design decisions.
- Get a working understanding of the technologies that are required for the coding phase.
- Acquire the necessary permissions to work in the Wikimedia developer ecosystem.
=====June 13 - June 26
- Ramp up on the developer workflow and code standards.
- Build the infrastructure for the ETL pipeline.
- Setup the application server using Flask (or any other web framework).
=====June 27 - July 10
- Modify existing API/database permissions to allow required data to be queried by the service.
- Write relevant queries to import the appropriate data and convert it into a DataFrame (or any other data container).
- Explore if parallelization and stream reads are necessary, given the size of the data.
=====July 11 - July 24
- Clean and process the imported data to convert it into a suitable form ready to be consumed by the plots.
- Modify the data to accommodate the requirements for each of the graphs.
- Complete the mid-project report for phase-1 evaluation.
=====July 25
- Phase-1 Evaluation.
=====July 26 - August 7
- Develop the finalized graphs using the finalized visualization library.
- Develop the web controllers to accommodate the web pages.
=====August 8 - August 21
- Build the user interface using HTML/CSS and enable placeholders for data display.
- Forward the graphs to the front-end for display.
- Test the webpage responsiveness and compatibility across browsers and devices.
=====August 22 - Sept 4
- Integrate, if required, with internal/external wiki pages.
- Dockerize the application, if required, and deploy the service.
- Perform end-to-end integration tests to expose bugs, security vulnerabilities, and other unnatural behavior.
=====Sept 5 - Sept 11
- Monitor the metrics and perform load tests to ensure scalability.
- Complete the necessary documentation guides (different from code documentation) and final project report document.
=====Sept 12 - Sept 19
- Final Evaluation.
===High-level system design
====Architecture diagram
{F35054378}
===Participation
=====Describe how you plan to communicate progress and ask for help, where you plan to publish your source code, etc
During the period of the program, I would do the following:
- Push my code into the designated remote code repository after performing the required tests and addressing code reviews comments.
- Write detailed weekly reports through Wiki pages or my blog.
- Stay up-to-date with my goals as outlined in the timeline.
- Communicate regularly with mentors and keep them updated about my progress and challenges. Wikimedia mentors use Zulip chat for communication.
- Submit evaluations on time.
- Attend any program-related meetings that are hosted.
- Any other requirements set forth by the organization or GSoC.
===About Me
====Education
I completed my bachelor's in Computer Science in 2021, with a distinction, from Amrita University, which hosts one of India's top computer science programs. I have also completed multiple specialization courses in Data Science and Machine Learning.
=====How did you hear about this program?
I have known GSoC for a long time and have even submitted a proposal last year: https://akashsuper2000.github.io/blog/gsoc-2020-proposal
=====Will you have any other time commitments, such as school work, another job, planned vacation, etc, during the duration of the program?
I have recently started working as a Software Engineer (post my graduation in 2021). However, I have been accepted into a University in the United States for my Master's in Computer Science. Therefore, I would be available for the entirety of the program, except for one week (August 1st, 2022 to August 7th, 2022), when I would be busy with my relocation. Neither my job nor my relocation would affect, in any way, my ability to contribute to the program.
=====We advise all candidates eligible for Google Summer of Code and Outreachy to apply for both programs. Are you planning to apply to both programs and, if so, with what organization(s)?
I am only applying through the Google Summer of Code program.
=====What does making this project happen mean to you?
Wikimedia's mission is to bring free education to the world, a mission that deeply resonates with me. This opportunity allows me to directly improve this system while being able to learn new technologies, build critical infrastructure, and network with people who also share this vision. Specific to this project, I would be able to put my data science skills to good use by enabling users to understand the impact of various campaigns which translates to a more efficient financial expenditure to grow this community. This is also my gateway to start contributing to open source.
===Past Experience
====Describe any relevant projects that you've worked on previously and what knowledge you gained from working on them.
======Web development
Throughout my undergraduate years, I was involved with projects in web development that enabled me to build solutions that had an immediate impact. Some of the projects include the 'Faculty Dashboard' built using ReachJS that aims to solve the need for a centralized portal for the faculty of my institution, and the 'Voice-based transport inquiry system' built using Java SpringMVC that features an inbuilt voice IO system. I have also worked on Python-based web frameworks like Flask to build quick applications for deploying stats visualizations, running cron jobs, and hosting machine learning models.
**Links to applications that are hosted at the moment**
- COVID-19 dashboard using Flask: https://akashsuper2000.pythonanywhere.com/
- Python executor using Flask: https://akash2000.pythonanywhere.com/
- Faculty dashboard using ReactJS: https://akashsuper2000.github.io/faculty-dashboard/
======Data Science
I have hands-on experience working on a range of projects that utilize data science concepts clustering, hypothesis testing, ranking, regression, and SVM as part of my "Fundamentals of Data Science" course I attended in my college. As part of the course, I got to work with tools like Numpy, Pandas, Matplotlib, Seaborn, Plotly, and Bokeh, allowing me to quickly ramp up to Wikimedia's development ecosystem.
======Big data
Through the "Big Data" course I attended in my college and as part of working as a Software Engineer in a huge organization, I got the opportunity to explore and work on big data tools in the Apache Hadoop ecosystem such as MapReduce, Hive, and Pig.
======Databases
I have extensively used a variety of diverse databases like MySQL, MongoDB, Aurora RDS, DynamoDB, Cassandra, and Google BigQuery. I believe that these experiences would enable me to transition smoothly into the MariaDB ecosystem here at Wikimedia.
=====Other - Competitions
My efforts in a diverse set of projects are complemented by my involvement in hackathons and competitions. I have participated in numerous Kaggle competitions, securing multiple medals to rank among the top 200 globally. I have also participated in CTF contests where my team ranked top 100 nationally for two consecutive years.
====Describe any open source projects you have contributed to as a user and contributor (include links)
While I have a good number of "open-sourced" projects under my belt such as "license plate detection", "voice-based ticket booking system", and "COVID-19 tracker", I do not have first-hand experience contributing to an external open-source project. I believe that this program would be a good starter for just that. Moreover, through this program, I can build valuable connections in the community and get into active open-source participation and contribution.
===Other Information
====Pre-requisites for the project
=====Microtasks
Completed both the microtasks assigned to evaluate my candidacy for this project and got them approved by the mentor.
- Microtask 1: https://phabricator.wikimedia.org/T304974
- Microtask 2: https://phabricator.wikimedia.org/T305309