Page MenuHomePhabricator

[GSoC Proposal 2017] Single Image Batch Upload
Closed, DeclinedPublic

Description

Project Title

Single Image Batch Upload

Personal Information

Name: Fotso Fono Kevin Larry
Email: kfotso61@gmail.com
IRC Nick: djff
Location: Cameroon
Github: Github Profile Page

Project Mentors

Primary mentor: @Basvb (python, batch uploading experience, Commons)
Co-mentor: @tom29739 (Code-review, python and tools), and @zhuyifei1999 (labs, commons, python)

Synopsis / Project summary

Single Batch Upload is a project meant to facilitate the upload process of images to Wikimedia Commons. If someone wishes to do a batch upload released by a GLAM, a lot of the images uploaded may turn out not useful. Alternatively, uploading images one after the other causes waste of time and can be boring in case of too many images. Hence this project aims at building a framework to automate and eliminate all this time wasting experienced during this process while uploading just what is relevant to Commons.

Prerequisite work

  • Github link to the project (as at now): Single Batch Upload project on Github.
  • So far some prerequisite work has been done on the project:
    • Uploading of images to my commons account using the UploadRobot from pywikibot Commons Images. So we noticed that UploadRobot interacts a lot with the terminal rather than the calling script. So for the purpose of this framework and to have more control during the upload process, two tools will be considered for the upload process: pywikibot.site.APISite.upload or pywikibot.page.FilePage.upload
    • Creating a simple flask app that uploads items from National Achieve when the url app/upload/GLAM/ID is called in browser. T161332 check here.
    • A simple title generator has been implemented. The sample NationalArchieve script was uploaded to create a dictionary holding possible title parameter, and this parameters were sent to a title generator to create a possible descriptive title. T161337, Sample here.

Project Goals (Verifiable Deliverables)

  • Creating a UI design for users
  • Creating a Fask app backend to manage GET requests and implement endpoints.
  • Implement metadata-mapping of GLAM's
  • Implement License verification
  • Implement OAuth to authenticate users on Commons during upload
  • Design and Implement a testing framework to write tests
  • Implement logging function (to track and detect errors in case of unsuccessful uploads for technical purposes)
  • Create Documentations (both technical and normal users)
  • Host application on wikimedia toollabs

Detailed project description

Introduction
Currently when somebody does a batch upload (uploading a lot of images which were released by an archive/museum (GLAM)) all images get uploaded and lots of those might not be used or not that useful. The other option: uploading released images one by one wastes a lot of time. A solution to this would be to provide the metadata-mapping which makes an upload possible and a framework which, using these mappings, can be used to upload a single (or a small subset) of images from a GLAM. In the ideal final version this would allow a GLAM to have an "upload to Wikimedia Commons" button on their website.

Proposed Implementation Approach
A flask app will be create and necessary endpoints implemented. the folder structure will be as follow.

  • glam_templates: Holding metadata-mapping scripts for each GLAM to parse and get items to be uploaded. Also it hods generic script like Generating_titles and Upload_to_commons.
  • templates: Hold UI templates which will be rendered when the app is run so as to provide the ability for a user of the app to select a GLAM and enter an identifier / select images/files to upload or upload all files.
  • static: Used to hold all client based script and UI stylesheets (css).

Proposed Flow of the application

  • A user selects a GLAM from the UI provided and enter and Identifier / select a elements (images) then clicks on the button "upload to commons". This button will trigger a GET request from the flask app using the link app_name/upload/GLAM/ID.
  • Use OAuth to authenticate the user's account
  • The GLAM name is then used to get the corresponding metadata_mapping script from glam_templates. Using this fields and other fields extracted by this script, a dictionary holding possible title_params is then generated and get to Generating_title function.
  • Use the licensing scheme to check each files license.
  • At this function, a title is generated using a combination of the params sent and then appended to the image's extension (gotten from the metadata_mapping) after that, the title is passed through a validation method and if valid its returned to the calling function.
  • This image title is now sent with the metada to the upload method from pywikibot.
  • Message is send upon success or failure. Images not uploaded will also be shown to the user.
  • logs on the progress is kept in the logging file.

Alternatively, a GLAM can just click on a button which will be provided on its website to upload all its recent prepared images. This button will trigger the upload function on our framework and will have the GLAM's name. When this trigger operation is done, step 2 to last above will be executed.

Development Schedule / Timeline

May 4 - May 30 (3.6 weeks)

  • Create relevant tasks on phabricator for specific discussions.
  • Gather list of GLAMs with help of mentors (classifying them in terms of relevance)
  • Request for account on wikimedia tool labs
  • Look more in-depth into tools like flickrriper, mapillary and GWToolset so as to understand their implementations, their strength and weaknesses so as to know where singleBU (single batch upload) will strive.
  • Research on OAuth implementation in wikimedia
  • Place request for account on tool labs
  • Document and report findings so as to decide on the final structure of singleBU
  • Implement the Design document of singleBU

June 1 - June 10 (1.3 weeks)

  • Create the Flask app from scratch, design and implement all the necessary endpoints.
  • Write tests for the work done.
  • Push application to tool labs (hoping access will be granted by then), eventual updates will be pushed regularly.
  • Document and report results to mentors.

June 11 - June 25 (1.1 weeks)

  • Implement the UI templates.
  • Implement OAuth to authenticate user account before uploading to Commons.
  • Deploy the framework to toollabs.
  • Develop a scheme to implement the License verification.
  • Document , report and write automated test for work done.

June 26 - June 30 (5 days)

  • First Evaluation (Phase I).

June 30 - July 23 (3.3 weeks)

  • Setup a meeting with a GLAM member for discussion to talk about the tool.
  • Implement metadata-mapping for all GLAMS collected (it may be a generic form common to all GLAMs or it may be specific for each GLAM depending on the conclusions of above research. (Current sample on my github shows specific implementation).
  • Write generic scripts common to GLAMs like GenerateTitle and UploadToCommons.
  • Write automated test scripts for work done and track execution using a log file.
  • Document and report to mentor.

June 24 - July 28 (5 days)

  • Second Evaluation (Phase II).

July 28 - August 10 (2 weeks)

  • Continue Implement metadata_mapping for GLAMs if any.
  • Implement License validation.
  • Finalize implementation of logs and automated tests.
  • Document work done and report to mentors.

August 11 - August 20 (1.4 weeks)

  • Write a wiki page about the tool.
  • Keep improving on the application given user's feedback.
  • Improve and Review documentation (both technical and user documents).
  • Report to mentors.

Work after GSoC 2017 on the project

I am still very new to opensource organizations and I wish to have a contribution that will impact people positively and free of charge. So I intend to see through this projects continuity by improving on the framework given the feedback from the community and the users even after this internship period.

Time Availability during GSoC

I would put in at least 40 hours per week (4am-8am and 3pm-9pm per day UTC +1:00) on this project and will be coding during weekends (occasionally), regularly informing my mentors on my progress and regularly updating my wiki report page or phabricator ticket of the project (reporting). I will probably be programming in the evenings during weekends (Saturdays and/or Sundays). I will always be available to respond to my mentors messages with a max delay time of +15mins.

Given that I am in my last year of computer engineering. We have one semester of lesson (already done (32 credits)) and one other semester of internship (required) and a final project to defend. I intend to use this Gsoc as my internship (30 credits) and the final product will be defended (in August) as my work done during the internship. So if I am taken for Gsoc, I will have full time work till August where I will be having some meetings with my academic supervisor.

Why Wikimedia Foundation (WMF)?

I heard of Wikimedia Foundation from a fellow friend who did GSoC last year and started a started a movement for wikimedia Cameroon which I decided to join. I am an advocate of openness and this organisation alligns with my principles and focuses on encouraging grown and distribution of free content to help the world in general and my continent in particular. Hence I am highly motivated to provide my knowledge and technical know how to contribute in my own way to the movement.

Some Past Experience

  • Trounce flow, Worked with a UK based startup as a freelancer to extract, process and visualize financial data from bloomberg api.
  • Contact Directory, used to manage all sorts of contacts on the cloud and can be access via mobile.
  • Facial Recognition, Facial recognition using openCV for school attendance system.

My Programming skills

  • Programming Language: Javascript (good), PHP(Above Average), Python(Very good), Java(Very good).
  • Frameworks / Tools: Django (Excellent) , Flask(Good), git/Github/gitlab, SSH.

About Me

  • Currently completing my last year in computer engineering from the University of Buea, Cameroon.
  • I am a member of the GDG Buea (Google Developer Group Buea) and like to promote women in tech by organizing PL/meetups for girls in high school.
  • Was a mentor of the programming language(PL) meetup event organized by the GDG Buea in 2016 to help non techies have an introduction to programming class for free.
  • I am an advocate of Openness and have taken part in local Open conferences to promote openness in the society.
  • More about my personal attributes Plum Test.

Event Timeline

Hello @Basvb , @tom29739 , @zhuyifei1999 , Please, I just submitted a draft proposal. I hope to get your review before submitting the final copy.
Thanks a lot in advance.

Proposed Implementation Approach

LGTM

This image title is now sent with the metada to the UploadRobot method from pywikibot.

use pywikibot.site.APISite.upload or pywikibot.page.FilePage.upload; this is what pywikibot.specialbots.UploadRobot actually calls, but UploadRobot heavily interacts with the command line interface and is therefore not very suitable for web tools.

write automated test for work done

Awesome

June 24 - July 28 (5 days) ... Implement metadata_mapping for GLAMs if any

Is this duplicate with June 30 - July 23 (3.3 weeks) ... Implement metadata-mapping for all GLAMS collected?

Deployment of the application on Wikimedia Tool Labs (test instance) so that members of the community can test and give feedback for improvement.

Is it possible to deploy this early and change this to announcing to the community? It would be much easier to test the stuffs if it is already on tool labs IMO.

Creating of an official Wikimedia repo with the help of mentors (in the Wikimedia Organisation on Github) for this project so that other can contribute and development will continue after GSoC.

I'm sorry but this is probably not gonna happen. Repos in that org are mostly for production services (i.e. things under wikimedia.org domain). You're free to host the repo under your own account, create a new org, or move to other non-official orgs such as https://github.com/toollabs/

Hello @zhuyifei1999 , Thanks a lot for your feedback, I will do necessary modifications. Yes I agree with you totally for the UploadRobot, as I noticed it prints out alot of infos to the terminal (thanks for the guide on this)

Hi @djff, thank you for the overal very good proposal. I'll give some remarks as requested:

  1. Could you list your working hours (and UTC-time) to the details?
  2. Synopsis: good overal, maybe you can make a bit more clear what the project does to tackle the described problem (using pre-made metadata mappings to allow everybody with a Commons account to upload single files from a GLAM). Aah I see that detailed project description tackles this, fine.
  3. For T161332 you did not write any replies on that task itself, where can we find the relevant code (which parts of your github)
  4. Deliverables: Good idea with the logging, maybe a good extra is to have some landing pages (on Commons?) where people can propose new templates and changes to templates, but we can also consider that to be my task.
  5. UI: I think it is better to avoid the term "UI templates" otherwise templates have a lot of different meanings within this project (Commons also has templates and we can have a template for metadata-mappings). I also think it is not the best descriptive terms. Maybe UI designs or simply UI.
  6. Proposed flow: Maybe license can better be an earlier step in the flow. As without a valid license we do not need a title or wikitext file description. I think that the OAuth step is even before that (see UI, OAuth, license check, then the rest)
  7. "a GLAM can just click on a button which will be provided on its website". Not just the GLAM but any person (if they have a Commons account) should be able to use the tool.
  8. May 4- May 30: looks very good. I think the GWToolset would also be a good tool to look at and get some ideas from. Maybe creating the relevant tasks in Phabricator is also an option here, although it will be something that you'll be doing and updating continuously.
  9. Toollabs integration is something we should start with immidiatly and updating the tool there is something which should preferably be done weekly or even more often.
  10. OAuth implementation is something for the first weeks as well, as it is very important to the tool working for other users. (you could for example move the title generation to later in the process).
  11. "Deployment of the application on Wikimedia Tool Labs" is a bit late in the process. I think we should aim to have a working MVP at the end of June, and after that we focus on core extensions and extra's which improve the tool. This allows us to also gather some feedback in the first half of July from others and use this feedback for improvements.
  12. Nice to see that you intend to stay involved even after GSoC.
  13. Time availability: Could you please provide more information on which months you will have other activities (study) and how many time these take each week and how many credits your classes are (please also provide the total credits per year normally).
  14. If you think that it is interesting I can try to setup a meeting with somebody from a GLAM who deals with data-release/Wikimedia. We could discuss what they think about the idea and maybe about an upload button from their site and about their mapping. This could be something we do at the start or after the MVP is finished (I think that is the best time).

Thank you for your proposal, and hopefully the points above can help you in creating an even better proposal.

@djff You are applying only for GSOC right? Removing Outreachy tag, please correct if I'm wrong here! thanks.

djff updated the task description. (Show Details)

Hello @srishakatux , yes, I am applying for GSOC.

@djff: A quick heads up: don't forget to set your proposal at the GSOC website to final before the deadline

Sorry for the difficulties between Google and the University, hopefully we can still see you around (maybe in a next round?). I'll be closing the task, you're always welcome to come and discuss with us.

Aklapper changed the task status from Resolved to Declined.May 7 2017, 4:05 PM

@Basvb: That makes this task declined though. :)