Meeting 8 - Fri 17 June 2016 - 12:30 UTC
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	DrTrigon
	Jun 10 2016, 2:18 PM

Description

Date: 17 June 2016
Time: 12:30 UTC
Type: skype, fallback IRC (channel #gsoc-catimages)

Description: 8th meeting (Week 4) to discuss the projects progress, regarding the midterm evaluation and outcome

Agenda:

final decision about T135835: Meeting about handling non-pip packages / replace externals module (DrTrigon)
check progress made towards midterm evaluation
first beta-tests?
Brief status report of the video copyright project (jayvdb)

Minutes of the Meeting:

Make a simple bot and readme - User:AbdealiJK/file-metadata and gist:a94fc0
Draft email to commons-l - User:AbdealiJK/file-metadata/Email
Upstream
- dlib - Created the fix for setup.py dlib/136
- matplotlib - Created PR to suggest installation step matplotlib/6575
- skimage - Found unusual bug in skimage where the file reading is not giving the expected output - scikit-image/2154
Bug fixes -
- Unicode encodings - There was an issue in the decoding from C libraries pointed out by Zhuyifei c755e7
- Dict issue - The dict did not handle keys which had a value before and get changed to None. Pointed out by Zhuyifei e11dc9
- Large decompressed files - Pillow threw a warning for files that had too many pixels. We now warn and ignore this gracefully. Fixed in 19382d
- zxing small images - ZXing has an issue with very very small files because the first 3 pixel locations were hardcoded. Ignore small images for zxing. Fixed in 1c0de6. Fixed upstream too zxing/607.
- zxing unsupported image type - ZXing does not support CMYK files. In some cases, exiftool is unsure if a file is RGB or CMYK. Here, just assume it's cmyk and convert it. Fixed in 7fc106
New features
- zbar support - Added zbar support 9a8e62. zbar detects barcodes that zxing does not detect, especially vertical barcodes. It seems to have more false positives too. zbar also has some performance issues: It considerably slower as compared to zxing (3 times slower on my computer) and is considerably more memory intensive (My system hangs when running zbar tests).
  - this is very nice, though detects some types wrongly
  - we could improve by using zbar to detect rotated codes and then rotate the image and run zxing over it (again), iterate this max. 3 times to have all possibilities (may be mirroring could be useful too) - so use zxing to verify and do more reliable detection
  - we need to find products/use cases/wiki project for this barcode detection, e.g.
    - product recognition: https://commons.wikimedia.org/wiki/File:HK_CWB_Yee_Wo_Street_%E5%A4%A7%E7%8F%AD%E9%BA%B5%E5%8C%85%E8%A5%BF%E9%A4%85_TaiPan_bakery_breads_Sandwich_plastic_bag_pre-packed_food_Nutrition_Information_Sept-2013_San_Po_Kong_LHIB.JPG contains Data: 4891267000171, Format: EAN13
      - see https://en.wikipedia.org/wiki/Universal_Product_Code
      - database lookup: http://www.upc-search.org/ and http://www.ean-search.org/
      - no result for UPC: http://www.upc-search.org/perl/upc-search.pl?q=4891267000171
      - no result for EAN: http://www.ean-search.org/perl/ean-search.pl?q=4891267000171
      - so looks not so easy to find some of the products
    - qr codes on commons: https://commons.wikimedia.org/wiki/Category:Quick_Response_Codes
    - WikiProject QRpedia is a mobile Web based system which uses QR codes to deliver Wikipedia articles to users, in their preferred language: https://commons.wikimedia.org/wiki/Category:QRpedia
  - bit concerned about the zbar code speed (may be since it's 6 years old)
- Created with - Analyze the exif data and provide a simpler analysis routine by checking all the traces of different softwares. This routine has a curated list of softwares which have a specific key associated with it. 3eb80c
Bulk tests - I got the bot running in the toolslab server (Zhuyifei wanted to test it there and I thought it'd be a good idea to get it running there). I also got it to run in bulk using toolslab (because Travis' time limit was getting annoying). Mae the logs at ...logs/Category_Images_from_the_State_Library_of_Queensland and .../logs/Category_JPEG_files
- we have already some very nice results here (few or no false-positives), even no-portraits get detected quite well: https://commons.wikimedia.org/wiki/File:StateLibQld_1_211364_Neil_Cameron.jpg
- some faces do not get recognized at all - we would like to improve
- e.g. train the bot with images of persons we now in advance that they will appear in a dataset (e.g. generals or politicians during wars, etc.)
- for next week implement haarcascade, there is also code for training but unclear how that will procede (let's see ;)
- may be also train the bot on the dataset itself at least after humans have gone over it
T135835: we go for Docker (with help of e.g. Vagrant and VM like VirtualBox) and conda (mainly for win to fulfill deps)
MVP:
- Categorize based on metadata actually needs the bot to write to commons but that is bad idea during beta testing since that might cause chaos, thus for now just print to console the proposed changes to wikipage and improve that continuously until we are stable enough can mass-write
we have first progress regarding beta-testers: zhuyifei1999 and 99of9 (thank you very much for your participation! that's exciting!)
- a bot script was made and runs on toollabs - needs to be documented (e.g. like https://wikitech.wikimedia.org/wiki/DrTrigonBot or better in userspace for now) to get more testers
- have 2 different bot run modes:
  - auto: be conservative - no false-positive not to annoy commons users with unreliable bot work (and give the maintainer a lot of work to fix stuff)
  - user-maintained: be more experimental - show ALL possible results (no matter how significant) and the user decides which ones are valid
- brings me to the idea T138119: Use user-maintained bot run mode to gain stats and learn
the video copyright project status is very promising (even though a short timeout) and up to be able to name uploaded movies
- we have to think about integrating it into this project (cross-language python and .NET)
- after the MVP we will start to work together (hacking session, mettings/communication, etc.)

Related Objects
Search...

Status	Subtype	Assigned	Task
Invalid		None	T72936 Important tasks to be solved (tracking)
Open	Feature	None	T57880 Functionality existing in compat but missing from core (tracking)
Declined		None	T66838 Port catimages.py to core
Resolved		AbdealiJK	T129611 [GSoC 2016 Proposal] Port catimages.py to pywikibot-core
Resolved		01tonythomas	T134721 Weekly reports of GSoC 2016 projects (tracking)
Resolved		AbdealiJK	T133762 [GSoC requirement] Weekly Reports for Port catimages.py to pywikibot-core
Resolved		None	T137557 Meeting 8 - Fri 17 June 2016 - 12:30 UTC