Page MenuHomePhabricator

Proposal for "Create or improve a tool for monitoring or automating tasks for Wikimedia databases"
Closed, DeclinedPublic


Name: Zekai Wang
GitHub Account:
Major: Computer Science
Expected Graduation Date: 2021
How many hours will you work per week? 8h*7
EDUCATION: University of California, Berkeley, CA, U.S.
BS, Junior Student, Undergraduate Researcher


Project Name
Create or improve a tool for monitoring or automating tasks for Wikimedia databases

I’m interested in the project “Create or improve a tool for monitoring or automating tasks for Wikimedia databases” provided by Wikimedia Foundation. I am currently a junior and major in Computer Science at UC Berkeley. I’m familiar with using Flask, HTML, SQL, etc. And I’m also experienced in some projects about health care, which is related to Wikimedia Foundation. I think my experience may be helpful for this project.

1 Flask
The reason I choose Flask as the web application framework is that Flask is lightweight enough. It’s very suitable for website development, just like our platform.
It is designed to make getting started quickly, with the ability to scale up to complex applications. It began as a simple wrapper around Werkzeug and Jinja and has become one of the most popular Python web application frameworks.

2 Database
Why we use database? There are two reasons: one is the procedure we receive the data and select all the data are asynchronous, which can cause the conflict and come to program crash. The other reason is when we deal with large amounts of data, using a database is a good choice.
And MariaDB is a free-to-use, open-source database that facilitates effective management of databases by connecting them to the software. It is a stable, reliable and powerful solution with advanced features.

3 Socket.IO
Socket.IO is a library that enables real-time, bidirectional and event-based communication between the browser and the server.
We want to send real-time data of the browser every time we receive the data, instead of the traditional method which lets the browser request the data periodically (because this may lead to the repeated acquisition of the data), so we use Socket.IO.

4 Highchart
Highcharts is a pure JavaScript library that offers an easy way of adding an interactive chart to your website or web application. I use it to build the chart of a cardiogram. The chart can achieve:
(1) Selection of historical data range
(2) Download pictures in different formats
(3) Show the data in real-time

5 Bootstrap Table
Bootstrap Table is an extended table to the integration with some of the most widely used CSS frameworks. (Supports Bootstrap, Semantic UI, Bulma, Material Design, Foundation)
(1) User’s browser request to fetch the newest analyzed data for every 5 seconds
(2) Flask receive the request, select all the data from database
(3) Push the analysis results to user’s browser, show the data using Bootstrap Table
The table is designed to be able to record historical data, download analysis results in different formats. (such as CSV, txt, Excel, etc.)

6 Summary Article
After the coding part is completed, a summary article will be written to describe our achievement. The article will include the explanation of the whole platform including the procedure of the data analysis, the details of the algorithm and the instructions of how to use the platform. I think my experience of writing the article in the lab will help me in this part. The article will be posted on Github or wherever appropriate.

Related Experience

Liwei Lin Lab, Group Member, UC Berkeley, U.S.
I am doing research with Prof. Lin in Berkeley now, and have two research experiences.

  1. I developed the platform which can receive the pulse signals data from the sensor and then send them to the cloud platform for feature extraction. I built the entire platform of the project, including signal collection, chart data display and data analysis, which is related to the aim of this project. Simple demo:
  2. Machine Learning Aided Design. The machine learning approach has been proven to be effective in solving the vast design space problem such as protein engineering. We use machine learning to design tough composites (2D) and metasurfaces but still in the initial stage.

National University of Singapore Summer Workshop 2019, Visiting student, NUS, Singapore July 2019 – Aug 2019

  1. Use Mask-RCNN and InceptionV3 to achieve object detection and garbage sorting. And use MQTT to make the Raspberry Pi have the computing power of the notebook’s GPU
  2. Used Transfer Learning to make InceptionV3 have a high recognition accuracy. The complete GitHub project connection is placed on the project homepage. Project homepage:

Chinese undergraduate computer design contest, National Second Prize, Nanjing, Jiangsu, China July 2018
Segmentation detection of bladder tumor images using “Fully convolutional networks (FCN)”.

Planning Project of Innovation and Entrepreneurship Training of National Undergraduate, Team Leader

  1. Elected by other project members to promote growth and success of the national project
  2. The aim is to build a hospital-oriented magnetic resonance imaging-based bladder tumor detection platform, in order to help radiologists detecting bladder cancer in MRI urography, and to assist physicians in sharing the database

Chinese undergraduate computer design contest, National Third Prize, Changchun, Jilin, China July 2019
Grading and staging prediction of bladder tumor based on magnetic resonance imaging and ResNet50 algorithm

The 5th China College Students' "Internet+" Innovation and Entrepreneurship Competition, Team Leader
Completed the scientific research tasks remarkably by using YOLO combined with Deepsort algorithm to realize the project requirements and achieve good results

Expected Timeline

Prior - May 4
Get familiar with the structure and source code of this project. Consider if there are other potential user preferences need to be introduced to the function.
Keep diving into project’s code by fixing issues.

May 4 - 30
Analyse code samples and try to find some common properties of the ones with optimal method order.
Using Flask and other packages mentioned above to develop.
Discuss with mentors about the implementation of the project. Investigate the classic algorithms which we can refer to.

June 1 - 30
Implement the algorithm and do validation on more samples.
Continue developing, start coding advanced version
Write a document of evaluation.

July 1 - 28 ·
Fix bugs of the algorithm.
Coding for the additional analysis algorithm. Testing the
automation process with the new analysis method.
Write a document of evaluation.

July 29 - August 14 ·
Update the document of the project.
Extend the test cases if needed.

August 14 - 29 ·
Write a summary article throughout the project.
Buffer for unexpected delay.
Try some more algorithms if I would have extra time to use.

Extra Information

Working Time
I will be based in China during the summer. Therefore, I will be working in GMT +8 time zone. I will be available in most weekdays and weekends, and can work for 8 hours every day.
Before the application result announced, I can dive into the code. For schedules at summer, I have no plan for vacation trips this year. All courses will end before June, and there are no examinations this semester. Currently, there is no internship for me.
Looking forward to good days participating in Wikimedia Foundation and joining Google Summer of Code!

Reason for Participation
I have always wanted to be a great developer since the first day I learnt programming. GSoC provides me a chance to make contributions to open source projects with mentorship from great developers all over the world. I believe it is really amazing. If I have the chance to participate in GSoC and work with Wikimedia Foundation, I will try my best to complete this project.

I am looking forward to working on the project with Wikimedia Foundation!

Event Timeline

Zekai99 renamed this task from Create or improve a tool for monitoring or automating tasks for Wikimedia databases to Proposal for "Create or improve a tool for monitoring or automating tasks for Wikimedia databases".Apr 7 2020, 9:54 PM

Hi and welcome! :) For the records, it looks like this task was created after the Outreachy application deadline (April 7, 2020 at 16:00 UTC).

More important than that, this was a GSoC-only project- not because I have anything against Outreachy (ofc), but because I can just commit to mentor 1 person during summer, and chose going through GSoC. Given the 7-day delay, I don't think that would be fair for the other students that adhered to the deadline. I was not even aware this proposal was in the works, when mentor feedback is highly recommended. Sorry.

I am very sorry for the late submission. In fact, I found this project on GSOC and have a great interest in it, and submitted the proposal before March 31. But I didn't notice that I need to submit proposal again on Phabricator. I discovered this requirement when I checked the project status on GSOC yesterday. Sorry again for making such a mistake, and I will appreciate it if I still have the opportunity to become the candidate for the project.

Pavithraes added a subscriber: Pavithraes.

@Zekai99 We are sorry to say that we could not allocate a slot for you this time. Please do not consider the rejection to be an assessment of your proposal. We received over 100 quality applications, and we could only accept 14 students. We were not able to give all applicants a slot that would have deserved one, and these were some very tough decisions to make. Please know that you are still a valued member of our community and we by no means want to exclude you. Many students who we did not accept in 2019 have become Wikimedia maintainers, contractors and even GSoC students and mentors this year!

If you would like a de-brief on why your proposal was not accepted, please let me know as a reply to this comment or on the ‘Feeback on Proposals’ topic of the Zulip stream #gsoc20-outreachy20. I will respond to you within a week or so. :)

Your ideas and contributions to our projects are still welcome! As a next step, you could consider finishing up any pending pull requests or inform us that someone has to take them over. Here is the recommended place for you to get started as a newcomer:

If you would still be eligible for GSoC next year, we look forward to your participation!