Project Title
Single Image Batch Upload
Personal Information
Name: Fotso Fono Kevin Larry
Email: kfotso61@gmail.com
IRC Nick: djff
Location: Cameroon
Github: Github Profile Page
Project Mentors
Primary mentor: @Basvb (python, batch uploading experience, Commons)
Co-mentor: @tom29739 (Code-review, python and tools), and @zhuyifei1999 (labs, commons, python)
Synopsis / Project summary
Single Batch Upload is a project meant to facilitate the upload process of images to Wikimedia Commons. If someone wishes to do a batch upload released by a GLAM, a lot of the images uploaded may turn out not useful. Alternatively, uploading images one after the other causes waste of time and can be boring in case of too many images. Hence this project aims at building a framework to automate and eliminate all this time wasting experienced during this process while uploading just what is relevant to Commons.
Prerequisite work
- Github link to the project (as at now): Single Batch Upload project on Github.
- So far some prerequisite work has been done on the project:
- Uploading of images to my commons account using the UploadRobot from pywikibot Commons Images. So we noticed that UploadRobot interacts a lot with the terminal rather than the calling script. So for the purpose of this framework and to have more control during the upload process, two tools will be considered for the upload process: pywikibot.site.APISite.upload or pywikibot.page.FilePage.upload
- Creating a simple flask app that uploads items from National Achieve when the url app/upload/GLAM/ID is called in browser. T161332 check here.
- A simple title generator has been implemented. The sample NationalArchieve script was uploaded to create a dictionary holding possible title parameter, and this parameters were sent to a title generator to create a possible descriptive title. T161337, Sample here.
Project Goals (Verifiable Deliverables)
- Creating a UI design for users
- Creating a Fask app backend to manage GET requests and implement endpoints.
- Implement metadata-mapping of GLAM's
- Implement License verification
- Implement OAuth to authenticate users on Commons during upload
- Design and Implement a testing framework to write tests
- Implement logging function (to track and detect errors in case of unsuccessful uploads for technical purposes)
- Create Documentations (both technical and normal users)
- Host application on wikimedia toollabs
Detailed project description
Introduction
Currently when somebody does a batch upload (uploading a lot of images which were released by an archive/museum (GLAM)) all images get uploaded and lots of those might not be used or not that useful. The other option: uploading released images one by one wastes a lot of time. A solution to this would be to provide the metadata-mapping which makes an upload possible and a framework which, using these mappings, can be used to upload a single (or a small subset) of images from a GLAM. In the ideal final version this would allow a GLAM to have an "upload to Wikimedia Commons" button on their website.
Proposed Implementation Approach
A flask app will be create and necessary endpoints implemented. the folder structure will be as follow.
- glam_templates: Holding metadata-mapping scripts for each GLAM to parse and get items to be uploaded. Also it hods generic script like Generating_titles and Upload_to_commons.
- templates: Hold UI templates which will be rendered when the app is run so as to provide the ability for a user of the app to select a GLAM and enter an identifier / select images/files to upload or upload all files.
- static: Used to hold all client based script and UI stylesheets (css).
Proposed Flow of the application
- A user selects a GLAM from the UI provided and enter and Identifier / select a elements (images) then clicks on the button "upload to commons". This button will trigger a GET request from the flask app using the link app_name/upload/GLAM/ID.
- Use OAuth to authenticate the user's account
- The GLAM name is then used to get the corresponding metadata_mapping script from glam_templates. Using this fields and other fields extracted by this script, a dictionary holding possible title_params is then generated and get to Generating_title function.
- Use the licensing scheme to check each files license.
- At this function, a title is generated using a combination of the params sent and then appended to the image's extension (gotten from the metadata_mapping) after that, the title is passed through a validation method and if valid its returned to the calling function.
- This image title is now sent with the metada to the upload method from pywikibot.
- Message is send upon success or failure. Images not uploaded will also be shown to the user.
- logs on the progress is kept in the logging file.
Alternatively, a GLAM can just click on a button which will be provided on its website to upload all its recent prepared images. This button will trigger the upload function on our framework and will have the GLAM's name. When this trigger operation is done, step 2 to last above will be executed.
Development Schedule / Timeline
May 4 - May 30 (3.6 weeks)
- Create relevant tasks on phabricator for specific discussions.
- Gather list of GLAMs with help of mentors (classifying them in terms of relevance)
- Request for account on wikimedia tool labs
- Look more in-depth into tools like flickrriper, mapillary and GWToolset so as to understand their implementations, their strength and weaknesses so as to know where singleBU (single batch upload) will strive.
- Research on OAuth implementation in wikimedia
- Place request for account on tool labs
- Document and report findings so as to decide on the final structure of singleBU
- Implement the Design document of singleBU
June 1 - June 10 (1.3 weeks)
- Create the Flask app from scratch, design and implement all the necessary endpoints.
- Write tests for the work done.
- Push application to tool labs (hoping access will be granted by then), eventual updates will be pushed regularly.
- Document and report results to mentors.
June 11 - June 25 (1.1 weeks)
- Implement the UI templates.
- Implement OAuth to authenticate user account before uploading to Commons.
- Deploy the framework to toollabs.
- Develop a scheme to implement the License verification.
- Document , report and write automated test for work done.
June 26 - June 30 (5 days)
- First Evaluation (Phase I).
June 30 - July 23 (3.3 weeks)
- Setup a meeting with a GLAM member for discussion to talk about the tool.
- Implement metadata-mapping for all GLAMS collected (it may be a generic form common to all GLAMs or it may be specific for each GLAM depending on the conclusions of above research. (Current sample on my github shows specific implementation).
- Write generic scripts common to GLAMs like GenerateTitle and UploadToCommons.
- Write automated test scripts for work done and track execution using a log file.
- Document and report to mentor.
June 24 - July 28 (5 days)
- Second Evaluation (Phase II).
July 28 - August 10 (2 weeks)
- Continue Implement metadata_mapping for GLAMs if any.
- Implement License validation.
- Finalize implementation of logs and automated tests.
- Document work done and report to mentors.
August 11 - August 20 (1.4 weeks)
- Write a wiki page about the tool.
- Keep improving on the application given user's feedback.
- Improve and Review documentation (both technical and user documents).
- Report to mentors.
Work after GSoC 2017 on the project
I am still very new to opensource organizations and I wish to have a contribution that will impact people positively and free of charge. So I intend to see through this projects continuity by improving on the framework given the feedback from the community and the users even after this internship period.
Time Availability during GSoC
I would put in at least 40 hours per week (4am-8am and 3pm-9pm per day UTC +1:00) on this project and will be coding during weekends (occasionally), regularly informing my mentors on my progress and regularly updating my wiki report page or phabricator ticket of the project (reporting). I will probably be programming in the evenings during weekends (Saturdays and/or Sundays). I will always be available to respond to my mentors messages with a max delay time of +15mins.
Given that I am in my last year of computer engineering. We have one semester of lesson (already done (32 credits)) and one other semester of internship (required) and a final project to defend. I intend to use this Gsoc as my internship (30 credits) and the final product will be defended (in August) as my work done during the internship. So if I am taken for Gsoc, I will have full time work till August where I will be having some meetings with my academic supervisor.
Why Wikimedia Foundation (WMF)?
I heard of Wikimedia Foundation from a fellow friend who did GSoC last year and started a started a movement for wikimedia Cameroon which I decided to join. I am an advocate of openness and this organisation alligns with my principles and focuses on encouraging grown and distribution of free content to help the world in general and my continent in particular. Hence I am highly motivated to provide my knowledge and technical know how to contribute in my own way to the movement.
Some Past Experience
- Trounce flow, Worked with a UK based startup as a freelancer to extract, process and visualize financial data from bloomberg api.
- Contact Directory, used to manage all sorts of contacts on the cloud and can be access via mobile.
- Facial Recognition, Facial recognition using openCV for school attendance system.
My Programming skills
- Programming Language: Javascript (good), PHP(Above Average), Python(Very good), Java(Very good).
- Frameworks / Tools: Django (Excellent) , Flask(Good), git/Github/gitlab, SSH.
About Me
- Currently completing my last year in computer engineering from the University of Buea, Cameroon.
- I am a member of the GDG Buea (Google Developer Group Buea) and like to promote women in tech by organizing PL/meetups for girls in high school.
- Was a mentor of the programming language(PL) meetup event organized by the GDG Buea in 2016 to help non techies have an introduction to programming class for free.
- I am an advocate of Openness and have taken part in local Open conferences to promote openness in the society.
- More about my personal attributes Plum Test.