Page MenuHomePhabricator

Develop a web based monitoring dashboard to improve and monitor existing database backup inventory processes and improve long term maintainability of existing code
Open, MediumPublic

Description

Profile Information

Name: Hari Krishna
IRC nickname on Freenode: hkrishna (#wikimedia channel)
Location : United Kingdom
Typical working hours: 9am-5pm, GMT+1 (BST)

Synopsis

Based on task from T274636 (Database backup inventory improvements)
The project aims to build a simple web dashboard/webapp that will display and monitor data and metadata from various database backups that are produced everyday in WMF production environments. In addition, the web dashboard will report on the status of these backups or it's errors and will also show the status of past/ongoing/future backups. This will help WMF Database administrators provide a good overview of database backups processes and whether the backups processes are working properly (or not!)

Subject to future discussion, these metadata will be exposed through APIs and then can be used to make a webapp to display the data. As this data is exposed through APIs, other applications can be built upon it.

In order to enable easier maintainability and collaboration among open source developers, we will try and maximize code coverage for testing in this project.
Having worked on the microtasks with other volunteers, I have come to understand that without unit/integration tests, it is difficult to know if existing features are affected by a new patchset, and that can make collaboration between different open source volunteers and developers difficult. As a goal, we will write unit/integration tests for any code that is re-used from the existing database backups repository (in addition to tests for any new code) and as a stretch goal, we will try and maximise code coverage as I feel this would improve developer/volunteer experience.

  • Technology stack proposed for this project:

Backend
Python3/Flask -- Specific python version to be decided later on (depending on WMF production environment and whether we need to support Debian 9 and 10)
MariaDB v10.x for databases
These frameworks were chosen as the existing codebase is built using these technologies and the development team are also familiar with these technologies.

API First approach -- database inventory and backup metrics could be exposed through API end points and other applications could be built upon using data from these endpoints. (example -- external cURL script or data sources for Grafana dashboards used in WMF, or a web dashboard like what we are making currently)

Frontend
Simple Bootstrap 5 with VanillaJS or jQuery which will consume the data from the APIs above and show the data on the page. These frameworks/languages were chosen as I believe these are very straightforward and easy to maintain for any developer who may not be working on front-end code on a daily basis as only basic HTML/CSS/JS skills are needed to add/modify features in code.
In addition, new contributors can also contribute to our repository easily (and this increase open source volunteer engagement)

  • Possible Mentor(s): Jaime Crespo @jcrespo and Manuel Aróstegui @Marostegui
  • Have you contacted your mentors already? : Yes

Deliverables

  • Describe the timeline of your work with deadlines and milestones, broken down week by week. Make sure to include time you are planning to allocate for investigation, coding, testing and documentation

For development, we will follow agile methodology and try and deliver features agreed upon weekly or biweekly, with sprint reviews with mentors every week/biweekly.
I've broken down each week into tasks and deliverables.

For mid evaluation milestone on week starting July 12 (alpha), I will aim to try and deliver the following

  • A working front-end interface showing some of the data/metadata for backup files -- as a user, you should be able to see the list of backup files from different DB hosts and their associated metadata such as hostname, size, backup type(dump, snapshot) and date taken.
  • A working API backend with limited endpoints (/GET) for data to support the above.
  • Integration of new code/repository on Github/Gerrit with existing WMF Jenkins CI (eg. tox-docker?) -- as a developer, you should be able to check if the integration tests pass successfully.
  • Good tests for any new code we have written so far. (focus is on writing good tests than coverage)

For final evaulation milestone in August, I will aim to deliver a completed database backups and inventory monitoring webapp/solution which will help WMF DB Admins to monitor the backup processes. By final evaluation, the following should be done

  • A completed front-end interface showing all of the data/metadata for backups files and objects -- as a user, you should be able to see a list of all the existing backups from various DB hosts, their metadata such as eg. (size, whether it is a dump, snapshot), and statuses of backups jobs (ongoing, failed, finished, scheduled), showing a detailed view of when the backup process was last ran and on which hosts.
  • You can monitor the backups processes within the WMF DB instances, and whether there were errors (such as backup failures, etc)
  • A completed API backend with necessary endpoints for obtaining the backup data above, and can be parameterised (eg. /POST)
  • Good test coverage (try and aim for 60-80%) for any code used in the project
  • Good documentation using sphinx
  • Any stretch goals that have been agreed beforehand.

The plan below assumes that we will have a programming sprint of 2 weeks, where at the end of every sprint, a sprint review/retrospective will be done with the mentors.

Set-up and Introduction Phase (week 1 - 17th-23rd May) (15 hours)

  • Get to meet and know mentors and their work, understand their ways of working and availabilities.
  • Understand the backend infrastructure of WMF
  • Get to understand the project
  • Onboarding into WMF development resources/environments (eg. WMF Cloud, staging)
  • Understand bigger picture of the project and how it fits into existing infrastructure

Investigation Phase (week 2 and 3 - 24th May - 30th May) (15 hours)

  • Understanding problems that needs to be solved by gathering requirements from mentors through user stories
  • Understanding any constraints for our project and ensuring we have the right resources
  • Create rough mock ups for UI based on requirements gathered
  • Create overview of system design (mock up)
  • Present and gather feedback

Refinement phase (week 3 - 31st May - 6th June) (15 hours)

  • Based on feedback received, refine and agree upon stories and features for first evaluation (Alpha)
  • Evaluate whether we are using any existing code from codebase such as wmfmariadb, and evaluate whether we need to include test coverage for dependancies
  • Agree upon product delivery goals and testing goals for Alpha (mid evaluation milestone)
  • Convert goals into stories, following agile methodology and delivering stories weekly/biweekly and performing sprint review/retrospective (subject to mentor availability - may change)

Alpha development - Sprint 1 (Week 4 - 7th June to 13th June) (15 hours)

  • Set up framework for projects, create repositorries in Gerrit/GitHub, any Jenkins jobs, etc.
  • Develop basic skeleton for nack-end Flask/Django, basic front-end skeleton for the dashboard
  • Understand how to work with WMF development/staging environments and how to integrate our project with WMF infrastructure.

Alpha development - Sprint 1 (Week 5- 14th June to 20th June) (15 hours)

  • Create front-end design (Bootstrap/JS)
  • Integrate program with existing codebase for triggering/obtaining backup data/metadata
  • Add testing coverage for features developed in the previous week, finish off any left over work.
  • Sprint review / Code review, retrospective/feedback session from mentors
  • Agree upon goals for next sprint

Alpha development - Sprint 2 (Week 6+7 - 21st June to 4th July) (32 hours)

  • Create code for database CRUD operations
  • Create APIs for exposing some of the data through back-end
  • Integrate front-end (Bootstrap/JS) code with back-end APIs (Python) code
  • Create front-end code (Bootstrap/JS) to work with the APIs
  • Add functionality to perform Jenkins CI builds for new repository on WMF using existing tox-docker job.
  • Sprint review and feedback

Alpha (First Milestone) ready for evaluation - Test, documentation and cleanup - Sprint 3 (Week 8+9 - 5th July to 18th July) (20 hours)

  • Begin documentation process using sphinx
  • Ensure good testing, clean up code and make it ready for first submission
  • Any other last minute fixes
  • Obtain feedback from mentors post evaluation
  • Agree upon requirements/stories, acceptance criteria for final product
  • Agree upon any change of scope and/or fixes for final submission
  • Agree upon any stretch goals that we can do towards the end (if we have time)

Final product development - Sprint 4 (Week 10+11) (20th July to 2nd August) (20 hours)

  • Finish backlog from pre-evaluation
  • Ensure good test coverage for any existing codebase by writing unit/integration tests (or any reused codebase from wmfmariadb)
  • Complete API design to expose all of the data
  • Complete work on front-end to work with the completed API design

Final development, testing, documentation - Sprint 5 (week 12+13) (3rd August to 17th August) (32 hours)

  • Backlog from previous sprint, if any
  • Complete Sphinx documentation
  • Ensure we have fully completed all stories and requirements to meet acceptance criteria
  • Perform code reviews with mentors, review and feedback
  • Acceptance testing of requirements with mentors

Final week of programme, preparing final milestone for evaluation - Sprint 6 (week 14+15) (18th August to 30th August) (20 hours)

  • Fix any bugs relating to issues observed in pre-production/staging environment
  • Prepare program for final submission, ensure .git readme is up to date and documentation is also up to date
  • Time reserved for any stretch goals

Participation

I am open to additional suggestions for the below, am very flexible with my schedule
Participation during project

  • Mentorship from Manuel and Jaime - I will arrange weekly standup video-call / screen-share with Google Hangouts (1hr or more -- subject to mentor availability) with mentors to discuss progress and tasks and to obtain continuous feedback and guidance. This could be in the form of a sprint retrospective. and could be weekly/bi-weekly.
  • Direct communication with mentors through IRC/Zulip for any questions that spring up in the heat of the moment during development (I don't expect any response outside your working hours)
  • Weekly reports as recommended by Wikimedia in the form of blog posts or shared Trello/Kanban board to document progress of project
  • Project code to be published open-source in either Gerrit and/or Github, subject to discussion with mentors during project.

Participation after end of programme

  • Competition of any project backlog if any of for some reason we still have some minor backlog remaining)
  • Competition of any stretch goals that were agreed beforehand and couldn't be completed due to time constraints. (hopefully this doesn't happen)
  • Performing any final additions so that the project so that it is ready for deployment into production (could be a stretch goal but this depends)
  • Contributing to open source by being a maintainer of the project in the longer term

About Me

Tell us about a few:

  • Your education (completed or in progress)

Currently, I am a 4th year student doing an undergraduate degree in Computer Science studying in UK. Over the course of my degree, I have done trainee/internships for a large employer, with 7+ months of industrial experience as an intern.

  • How did you hear about this program?

Through another person who participated in the program many years ago -- I wasn't sure about my skills back then hence I am applying now :)

  • Will you have any other time commitments, such as school work, another job, planned vacation, etc, during the duration of the program?

I have University and life commitments on working days but otherwise I am free most of the time and I am also free on weekends and evenings and will make time on working days as follows (GMT+1 / BST)

Monday - Friday : 9am-1pm, sometimes 5pm-8pm

Saturday - : 9am - 5pm

Sunday - : 9am - 5pm (if needed)

Time commitment: 12-16 hours per week

I will try to be contactable during sociable/fixed hours. I aim to spend 12-16 hours a week working on the project and the schedules above are extremely flexible -- this will help me balance between life, university work and GSoC work.

  • We advise all candidates eligible for Google Summer of Code and Outreachy to apply for both programs. Are you planning to apply to both programs and, if so, with what organization(s)?

First time hearing about outreachy, not sure if I am eligible but can try next time as the deadline has passed I think.

  • What does making this project happen mean to you?

As a student I've only had the chance to contribute using my skills in various code bases that are often closed-source and not necessarily visible or used by the public.
I always wanted to learn how to contribute to the open source software community using skills I have learned from university and in industry. I've learned to do this through the microtasks (my first ever open source contribution!) and I've been able to learn good coding standards in Python through code reviews and this has greatly helped me. I've had the opportunity to interact and help fellow volunteers as well. I would like to give back and learn more from other developers/volunteers and with this project, I believe I will be able to do just that with my skillset and also get a good mentorship experience by helping build a good DB monitoring application which will benefit WMF DB Admins in the long run. In addition I chose Wikimedia Foundation as I am fond of the work that they do to ensure distribution of free knowledge to the world and educating everyone -- I've always relied on Wikipedia as a kid for knowledge and I am happy to contribute back!

Past Experience

  • Describe any relevant projects that you've worked on previously and what knowledge you gained from working on them.

Over the course of my degree, I have done trainee programmes/internships for a large employer, with 7+ months of industrial experience (as of writing), where I had to create and write applications to monitor critical infrastructure using test driven development, which I learned was very useful as it makes building upon code/modifying code very easy and straightforward especially when working with different developers. I've also learned the importance of site reliability and it's metrics.

I have also done projects in University where I had acted as an external consultant developing solutions for an external real world client in with frontend in Python, Tornado, MySQL for backend and simple Bootstrap/JS/jQuery for the front end, creating a webapp with both front end and back end components, which would then expose datapoints for Grafana dashboards. That being said, I still need to look up online if I need to center a div :)
Over the course of university, I've had good exposure to web technologies such as Node.js and creating REST APIs with them
and I have spent about additional 6+ months on these technologies over the past years in various projects at University.

I was able to make my first ever open source contribution and here are my contributions (pending review)

  • Describe any open source projects you have contributed to as a user and contributor (include links).

In spite of all the closed-source experience, I have not had a chance to contribute to open source until I discovered the GSoC program through a friend who recommended it and I found Wikimedia projects interesting -- the database project interested me and I had a look at the good first tasks.
I'm happy that I was able to make my first ever open source contribution and here are my contributions T277160 T277162

Frankly, this helped me understand the project and also gave me an avenue to improve my coding and communication skills with other volunteers and professionals.
Here are some of the tickets I have worked on, where I was able to contribute with good unit tests. T277160, T277162, and doing these tickets also helped me uncover a small bug in the code and raised a ticket for it in T277754.
I've also had the chance to work with another developer for a ticket I've raised, where I was able to help them out T277754

  • You must have written a feature or bugfix for a Wikimedia project during the application phase (see the section about microtasks in the application process steps), please link to it here. We give strong preference to candidates who have done so.

T277160 (not merged yet)
T277162 (not merged yet)
T277754 (raised a ticket and provided support through code reviews)

Related Objects

Event Timeline

Hi @jcrespo and @Marostegui -- this is my proposed GSoC application -- apologies for the formatting, still getting used to it. I will try and improve it.
I appreciate there are time constraints due to work commitments on your side and there is a lot of applications to review and was wondering if you could give me a quick feedback on the scope? Scope is assuming that this project will build upon existing code base and therefore we may need to re-engineer it to ensure good test coverage.

Restricted Application added a subscriber: Zabe. · View Herald TranscriptApr 7 2021, 3:26 PM

apologies for the formatting

no need to "fight" phabricator and working on formatting it, as long as it is readable it is ok. :-)

The important thing is the content. Will give it a look now, sorry, I had missed it at first.

  • You should register a nick on Freenode if selected, but that is not important for the proposal.
  • I like it in general, you have a nice introduction and develop a lot of detail that is interesting and relevant, so that is positive (you have the ideas clear).
  • My suggestion for communication is weekly meetings, but because this year there is less hours, we could change it. Or maybe we can do the meetings shorter.
  • While the development roadmap is clear, and I like the iterative approach (short sprints), one thing that we are "imposed" are the 2 milestones mid-project and the final one (more details on GSoC website timeline). It would be nice to be more explicit about the promises for the mid and final evaluation (with room for delays on your plan) to make those more relevant. E.g. "By 12 July, I will deliver X" "By August 26 I will deliver Y". While these are related to the roadmap, they should be about the what, not the how. e.g. "Work on API frontend" vs "You will be able to read a list of backups on the website". I hope what I mean is clear. If you could either add an additional section about deliverables, or put it in between the roadmap, I think it would benefit your application. Be conservative, it is ok to underpromise even if you have time allocated for more work- there will always be challenges happening.
  • Please do not promise to deploy anything, specially into production. While deployment is part of the development process, we cannot promise it will be deployed into production, and even if it is, it is likely to be done much later into the development process. For example, the student last year develop a "version 1.0" successfully, but it took weeks to update that on production, due to constrains unrelated to development (dependencies, extra testing, validation, etc.). A test deployment is more reasonable, but again, cannot be guaranteed, in case there are delays on cloud project approval, etc. Promise to have a working demo, and we will try it to have it hosted at wikimedia, but do not promise because it is something that will be mostly out of your control. In general, any work related to infrastructure will be out of scope, specially given the limited number of hours available. Feel free to reduce the scope and use the hours for more time for coding/designing/testing.
  • You can remove the "Any other info" section if you don't have any further info :-) not necessary. This is formatting and not important for now, mostly for the final submission.
  • You are not expected to be contactable during many hours. Most likely, we will set a band for real time communication, and the rest can happen asyncronously (in addition to regular meetings).

Hey @h.krishna

Thanks for showing your interest to participate in Google Summer of Code with Wikimedia Foundation! Please make sure to upload a copy of your proposal on Google's program site as well in whatever format it's expected of you, include in it this public proposal of Phabricator before the deadline i.e April 13th. Good luck :)

Thank you for your review @jcrespo
I have created a new nick on freenode (hkrishna)

Regarding development roadmap, I have added the following Milestones. When possible, let me know what you think

For mid evaluation milestone on week starting July 12 (alpha), I will aim to try and deliver the following

- A working front-end interface showing some of the data/metadata for backup files -- as a user, you should be able to see the list of backup files from different DB hosts and their associated metadata such as hostname, size, backup type(dump, snapshot) and date taken.
- A working API backend with limited endpoints (/GET) for data to support the above.
- Integration of new code/repository on Github/Gerrit with existing WMF Jenkins CI (eg. tox-docker?) -- as a developer, you should be able to check if the integration tests pass successfully.
- Good tests for any new code we have written so far. (focus is on writing good tests than coverage)

For final evaluation milestone in August, I will aim to deliver a completed database backups and inventory monitoring webapp/solution which will help WMF DB Admins to monitor the backup processes. By final evaluation, the following should be done

- A completed front-end interface showing all of the data/metadata for backups files and objects -- as a user, you should be able to see a list of all the existing backups from various DB hosts, their metadata such as eg. (size, whether it is a dump, snapshot), and statuses of backups jobs (ongoing, failed, finished, scheduled), showing a detailed view of when the backup process was last ran and on which hosts.
You can monitor the backups processes within the WMF DB instances, and whether there were errors (such as backup failures, etc)
- A completed API backend with necessary endpoints for obtaining the backup data above, and can be parameterised (eg. /POST)
- Good test coverage (try and aim for 60-80%) for any code used in the project
- Good documentation using sphinx
- Any stretch goals that have been agreed beforehand.

I understand publishing code into production can be difficult, so I have removed it and will reuse the time for testing/dev work
I will also remove the (Any other info) section, unless I think of something

@Gopavasanth
Thank you, in order to include the copy of the proposal submitted - will it be enough to add the Google Doc URL into this ticket under "Profile Information" section? Please let me know when possible

No need to include a copy here, we will be able to see through Google's website after the deadline finishes. The problem was we would be unable to see it beforehand and provide feedback, but we will be able to see the final submission after the deadline.

h.krishna renamed this task from Develop a web monitoring dashboard to improve/monitor existing database backup inventory processes and improve long term maintainability of existing code to Develop a web based monitoring dashboard to improve and monitor existing database backup inventory processes and improve long term maintainability of existing code.Apr 13 2021, 3:47 PM
h.krishna updated the task description. (Show Details)

GSoC application deadline has passed. If you have submitted a proposal on the GSoC program website, please visit https://phabricator.wikimedia.org/project/view/5104/ and then drag your own proposal from the "Backlog" to the "Proposals Submitted" column on the Phabricator workboard. You can continue making changes to this ticket on Phabricator and have discussions with mentors and community members about the project. But, remember that the decision will not be based on the work you did after but during and before the application period. Note: If you have not contacted your mentor(s) before the deadline and have not contributed a code patch before the application deadline, you are unfortunately not eligible. Thank you!

Hi all, just became aware that the project was accepted. Thank you, I feel humbled and I am quite excited about this

How shall we communicate? Where do we make the initial communication?
I'm there on Zulip (Hari Krishna) and also available on Freenode #wikimedia channel as hkrishna. I can see @jcrespo on Zulip
Let's get in touch and arrange a videocall or similar when we are free
@Marostegui @jcrespo

Congratulations @h.krishna!
We probably prefer IRC for async communication.
Let's work out a day/time to arrange a video call

Thank you @Marostegui
Sure, sounds great. What are your timezones and your availabilities this week for a 1hr (ish) meeting?
My timezone is GMT+1 (so it's 8.36AM when I posted this)
I should be generally free this week (except 10am-11am and 2-3pm, just have some university stuff)
I am on IRC channel #wikimedia, my IRC handle is hkrishna. Which channels do I join? I can't seem to find your IRC nick (marostegui) or jcrespo's nick (jynus) on #wikimedia. Perhaps I could use some help on what channels I should be joining.
I've DMed @jcrespo on Zulip as well
Any preference for video call software? (i.e Zoom, Google Hangouts, etc)

Change 692905 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[integration/config@master] Add H.krishna123 to the list of trusted users

https://gerrit.wikimedia.org/r/692905

Change 692905 merged by jenkins-bot:

[integration/config@master] Zuul: Add H.krishna123 to the list of trusted users

https://gerrit.wikimedia.org/r/692905

Mentioned in SAL (#wikimedia-releng) [2021-05-19T16:39:41Z] <James_F> Zuul: Add H.krishna123 to the list of trusted users T279552

I've requested new repo operations/software/bernard at https://www.mediawiki.org/wiki/Gerrit/New_repositories/Requests (usually they are super-fast to create it).

Just putting this here for awareness and for any future candidates
Links to my GSoC bi-weekly report/blog can be found here
https://www.mediawiki.org/wiki/Google_Summer_of_Code/2021/Bi-weekly_Reports

Direct link to my report (May 31st)
https://github.com/darkbluee/wikimedia-gsoc2021/blob/main/posts/31-05-2021.md

Putting the 2nd bi-weekly Wikimedia GSoC report here -- hopefully this is helpful for future candidates to see the story gathering/system design mocks process :)

Direct link to my report (June 14th)
https://github.com/darkbluee/wikimedia-gsoc2021/blob/main/posts/14-06-2021.md