Page MenuHomePhabricator

[Outreachy 25] Rewrite Imagebulk tool to scale up
Closed, ResolvedPublic

Description

IMPORTANT: Make sure to read the Outreachy participant instructions and communication guidelines thoroughly before commenting on this task. This space is for project-specific questions, so avoid asking questions about getting started, setting up Gerrit, etc. When in doubt, ask your question on Zulip first!

Brief summary

Wikimedia Commons is a media file repository making available public domain and freely licensed educational media content (images, sound, and video clips) to everyone. It acts as a common media repository for the various projects of the Wikimedia Foundation.

In Wikimedia Commons, there is no file download button at all. Even for a file, you have to open the file and save it by right-clicking manually. As a part of the Indic-TechCom initiative to provide technical support to the Indic community, we regularly get feedback from the community. In one such feedback, they asked us to develop a tool that can download multiple images from Wikimedia Commons. So we created the Imagebulk tool. It saves community members' time, for example, when trying to download a Wikimedia's events photos.

Imagebulk is a tool that allows users to download Wikimedia Commons files in bulk as ZIP. Through this tool, users download media files in a zip by providing a URL list of the file's page or category. Since we don't have access to Wikimedia Commons's filesystem, the tool downloads file over HTTP and then compresses them to make a ZIP file. Due to this, users are allowed to download only 50 files at once. Last year, I added a Celery worker to scale up, but this does not work well on toolforge. Now, this project has the following aspect:

  • Design UI of the tool
  • Based on the design, write frontend code in Vue.js
  • Validation of the user input list and showing the thumbnails of the list on frontend
  • Convert backend into API model
  • Improve Celery implementation to handle the large request
  • Allow users to download media from other Wikimedia projects
  • Deploy the app on CloudVPS (Debian Linux) instead of Toolforge

Skills required

  • HTML & CSS
  • Intermediate JavaScript & Python
  • Flask with asynchronous programming
  • Celery worker queue
  • Vue.js, with Vuetify and Vuex
  • Cloud deployment

Repository

https://github.com/indictechcom/imagebulk

Possible mentor(s)

Microtasks

Task 1:

  • Review current app UI at Toolforge.
  • Create new UI wireframe for webapp interface showing how user will go and interact to UI to get download the images in bulk.
  • Share UI wireframe to Jayprakash12345 and SGautam_WMF on the Zulip

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@Jayprakash12345 Thanks for adding this task. I've got a few questions and suggestions:

  • Could you add a little bit of background for developing this tool? How did the idea come up, who are the users, and how will they benefit from it? What is the intended use of the downloaded images?
  • Could you consider moving this to a Wikimedia repo https://github.com/Jayprakash-SE/imagebulk so that the future intern's contributions gain more visibility/good for their portfolio it helps draw you more contributors in general in the future?
  • If not 50, how many downloads are you considering making? Maybe you know the pros and cons of it already. Could you perhaps consider taking expert advice on the implications and best practices to keep in mind when technically implementing it?

Could you add a little bit of background for developing this tool? How did the idea come up, who are the users, and how will they benefit from it? What is the intended use of the downloaded images?

Added.

Could you consider moving this to a Wikimedia repo https://github.com/Jayprakash-SE/imagebulk so that the future intern's contributions gain more visibility/good for their portfolio it helps draw you more contributors in general in the future?

Moved to https://github.com/indictechcom/imagebulk.

If not 50, how many downloads are you considering making? Maybe you know the pros and cons of it already. Could you perhaps consider taking expert advice on the implications and best practices to keep in mind when technically implementing it?

When will have the asynchronous request handling then we can go ahead with any number. Since Wikimedia Commons and Wikimedia Cloud infrastructure are close (told by some WMF staff, a long before). There will not be any problem. But I am thinking to cap the ZIP size to 5 GB or something. Also, I am considering to take advice from Bryan Davis.

@Jayprakash12345 Looks good! It makes a lot of sense now. Do you have a co-mentor in mind? Do you need help finding one? We had a few mentors express interest in another project, so if you are still looking for someone, I or @Slst2020 can help. I think you can now upload this proposal on the Outreachy portal (the deadline is September 30th).

Tagging @bd808 for his awareness in case he has any immediate thoughts to share on your comment above around async request handling.

@srishakatux Yes, we need co-mentor for this project. Someone with design skill will be great for us. Thanks in advance.

@srishakatux Yes, we need co-mentor for this project. Someone with design skill will be great for us. Thanks in advance.

@Jayprakash12345 If you meant non-code help to design the UI/UX of the platform, then yes, I have someone in mind. I am asking them for availability.

If not 50, how many downloads are you considering making? Maybe you know the pros and cons of it already. Could you perhaps consider taking expert advice on the implications and best practices to keep in mind when technically implementing it?

When will have the asynchronous request handling then we can go ahead with any number. Since Wikimedia Commons and Wikimedia Cloud infrastructure are close (told by some WMF staff, a long before). There will not be any problem. But I am thinking to cap the ZIP size to 5 GB or something. Also, I am considering to take advice from Bryan Davis.

5GB seems like a quite large zip file both for generation and for download by the user. From a network point of view, the statement that the image hosting for Commons and the Cloud VPS instances are near to each other is true. Other issues to consider are the storage needs on the Cloud VPS instances for these zip archives (How long will an archive be kept after generation before being discarded? When you run out of storage quota will there be a way to clean things up or will the app be broken until more storage is added?) and the stability of streaming very large binary zip archives back to the requesting user across the various networks and reverse proxies.

It appears that the current tool is not handling very much traffic. Toolviews shows it only handling 123 requests from January 1, 2022 through June 30, 2022:

$ curl -s https://toolviews.toolforge.org/api/v1/tool/imagebulk/daily/2022-01-01/2022-06-30 |
   jq '.results[].imagebulk' |
   awk '{sum+=$1;} END{print sum;}'
123

@Jayprakash12345 If you meant non-code help to design the UI/UX of the platform, then yes, I have someone in mind. I am asking them for availability.

Yes, I meant to UI/UX person. Please let us know Srishti Ji.

5GB seems like a quite large zip file both for generation and for download by the user.

Thanks to let us know. We will contact you to decide new limit.

@Jayprakash12345 hope you're doing good. I am happy to provide design support for this task in between Jan-March next year. Do you have exact dates in mind by when you will need the design support?

@Jayprakash12345 Remember to list some microtasks both from the description of this task here on Phabricator and on the Outreachy's website. Application period begins Saturday. TY!

@Jayprakash12345 hope you're doing good. I am happy to provide design support for this task in between Jan-March next year. Do you have exact dates in mind by when you will need the design support?

Namaste Gautam Ji, We just need to time during initial phase, mostly in Jan to review the wireframe. After that, one final review before pushing the app to prod.

Hello @Jayprakash12345
I'm interested in this project and just made a PR.

@Jayprakash12345 hope you're doing good. I am happy to provide design support for this task in between Jan-March next year. Do you have exact dates in mind by when you will need the design support?

Namaste Gautam Ji, We just need to time during initial phase, mostly in Jan to review the wireframe. After that, one final review before pushing the app to prod.

Thanks for sharing

Hello @SGautam_WMF, I'm Enow, an Outreachy intern.

I just submitted a PR for the project.
Please kindly review and let me know if there's anything I should amend.

Thanks

I confirm that @Enow97 has submitted the task 1.

Hi @Jayprakash12345, I just updated the design, by including the image thumbnails and I added 2 more pages.

@Enow97 Hello! I didn't see any other task related to your project, so I am assigning this to you. As it's been a few weeks since the internship started, I am asking all interns to share a few updates (in 3-4 sentences) on their project progress in a comment on the relevant Phabricator task. I'd encourage you to do the same. For other reminders, please see my message on Zulip. cc @Jayprakash12345

@Enow97 Hello! I didn't see any other task related to your project, so I am assigning this to you. As it's been a few weeks since the internship started, I am asking all interns to share a few updates (in 3-4 sentences) on their project progress in a comment on the relevant Phabricator task. I'd encourage you to do the same. For other reminders, please see my message on Zulip. cc @Jayprakash12345

Hi @srishakatux, I have completed task 1 of this project and it's available at https://www.figma.com/proto/xqTX3wLPqtl6UlCJuEW5Bp/ImageBulk-Wireframe?node-id=1%3A2&scaling=scale-down&page-id=0%3A1&starting-point-node-id=1%3A2

Upon completing with the wireframe proposal, I was advised by @Jayprakash12345 to add a Docker file to the project before moving on to the next main functionality of the tool. I recently added a Docker file coupled with docker-compose to containerize Celery, Flask and Redis servers. All configurations have been done but I am currently facing some issues with docker. Once that's out of the way I'll be moving on to extending the download capacity of the tool.

@Enow97 That sounds great! If you haven't yet, please consider getting in touch w/ @SGautam_WMF to gather design feedback on your wireframes. cc @Jayprakash12345

Hello @Enow97, thanks for sharing the figma prototype. Is it possible to get a link of figma file so I can open it to have a detail look? @srishakatux what would be the best way to discuss designs with Enow?

Hi @SGautam_WMF. I've sent you an invite to the Figma file at this address: sgautam@wikimedia.org.
Thanks

This comment was removed by Enow97.

@SGautam_WMF You could share your feedback on the design here on the Phabricator task. Alternatively, you can set up a meeting to have a discussion on the designs. I will leave it on you to decide what works best. If you decide to take the meeting route, @Enow97 can send you an email (mentioned above) to coordinate a date and time.

@Enow97 what will be the best way to connect with you over a call to discuss the designs?

@Enow97 I have left some basic comments in the figma file, a call with you will help me understand intent behind design decisions in mockups.

@Enow97 what will be the best way to connect with you over a call to discuss the designs?

I guess we could connect with google meet, if that's okay by you

@Enow97 I have left some basic comments in the figma file, a call with you will help me understand intent behind design decisions in mockups.

Okay. I'm available for a call. Could be meet, or skype.

@SGautam_WMF I have responded to your comments in the figma file.

@Enow97 could you share your email id with me to schedule a call? also, what timezone are you based in so i can book accordingly.

Hello @Enow97 will you be free tomorrow to have a call?

Hi @SGautam_WMF. My timezones is WAT.

Yes I will be available tomorrow for a call.

Here's my email: enow9315@gmail.com

thanks @Enow97, I have sent you a meeting invite.

@Enow97 @Jayprakash12345 Hello! Could you both share an update on the outcomes of the project completed as part of the internship?

I cannot assess Enow's contribution from the GitHub repository: https://github.com/indictechcom/imagebulk, and apparently, Enow's Github account no longer exists. Please shed some light on this. Could you also share how the design feedback from @SGautam_WMF was taken into consideration?

I'd also appreciate your help in documenting any final outcomes of the project here: https://www.mediawiki.org/wiki/Outreachy/Past_projects#Round_25. Thank you!

Hi @srishakatux,

I am really sorry for being so late.

.

Could you both share an update on the outcomes of the project completed as part of the internship?

I have added the project on https://www.mediawiki.org/wiki/Outreachy/Past_projects#Rewrite_Imagebulk_tool_to_scale_up. Please look at that.

.

I cannot assess Enow's contribution from the GitHub repository: https://github.com/indictechcom/imagebulk, and apparently, Enow's Github account no longer exists.

I think I told you this in email but worth to mention here as well. I think account has been rename to DanielGraham123. See all contribution on https://github.com/indictechcom/imagebulk/commits?author=DanielGraham123.

.

Could you also share how the design feedback from @SGautam_WMF was taken into consideration?

Yes, feedback from @SGautam_WMF has been incorporated in code.

.

I'd also appreciate your help in documenting any final outcomes of the project

Done on the https://www.mediawiki.org/wiki/Outreachy/Past_projects#Rewrite_Imagebulk_tool_to_scale_up.
.
.
Enow has done their part in writing the code. Currently, deployment of the tool is left. I requested the CloudVPS project at T336109: Request creation of imagebulk VPS project. Once, it will create, I can start deployment of the tool.
.
.

Worth to mention here that Enow's blog are MetaWiki at https://meta.wikimedia.org/wiki/User:Enow97#Blog_Posts. Please don't confuse with blog on MediaWiki.org

Thank you so much.