- DrTrigon (1. Development Phase around 2013)
- https://commons.wikimedia.org/wiki/User:DrTrigonBot/doc - contains old/most recent TODO list, original opencv bot proposal, etc.
- alternative to pHash (not developed since 2013): http://blockhash.io/ (or just create an icon by averaging over pixels, resp. reducing resolution/scale/zoom)
- wavlet decompositions for peak detection, color regions, fingerprinting/hashing, frequency decomp., denoise, compress, etc.
- code/software:
- http://www.pybytes.com/pywavelets/regression/wp2d.html (supports 2D data, mature, see WaveletPacket2D.get_leaf_nodes() and store as xml/json)
- http://jseabold.net/blog/2012/02/23/wavelet-regression-in-python/
- literature/paper:
- https://www.researchgate.net/post/How_wavelet_transform_coefficient_used_for_image_classification
- http://www.cmapx.polytechnique.fr/~yu/publications/ICPR08Final.pdf <- implement this as it supports object recognition, texture and satelite images classification, text/image language identification, sound classification
- patch transformation: http://people.csail.mit.edu/taegsang/Documents/CVPRPatch.pdf
- A Tutorial of Wavelet for Pattern Recognition (Guan-Chen Pan): https://www.researchgate.net/file.PostFileLoader.html?id=54b4d915d685ccc6468b4652&assetKey=AS%3A273675431940106%401442260715381
- http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.684.5988&rep=rep1&type=pdf
- http://ac.els-cdn.com/S0377042706006431/1-s2.0-S0377042706006431-main.pdf?_tid=dca58f04-2708-11e6-9c23-00000aab0f27&acdnat=1464683149_3f76b534460f9c998182d86714c80597 (watermarking)
- http://soundlab.cs.princeton.edu/publications/2001_amta_aadwt.pdf
- Text Classification by Aggregation of SVD Eigenvectors: http://delab.csd.auth.gr/papers/ADBIS2012skm.pdf (might not be very useful... how many text do we need to categorize?)
- Chapter 15 - BLIND SOURCE SEPARATION: http://www.mit.edu/~gari/teaching/6.555/LECTURE_NOTES/ch15_bss.pdf
- code/software:
- head pose estimation
- see: http://rpg.ifi.uzh.ch/software_datasets.html (Perspective 3-Point (P3P) Algorithm)
- T137558: render error detection (see T136934)
- using convert (ImageMagick) the commons default, allows to compare commons results against other libraries and e.g. find rendering errors, see https://github.com/AbdealiJK/file-metadata/issues/37
- T61499: DRTRIGON-124 catimages several new features (todo list from commons)
- https://commons.wikimedia.org/wiki/User:DrTrigonBot/doc - contains old/most recent TODO list, original opencv bot proposal, etc.
- AbdealiJK, jayvdb, DrTrigon (2. Development Phase GSoC 2016)
- T135836#2314683, T135836#2314835: face recognition (e.g. like facebook) as well as age and gender - needs kind of DB (e.g. commons)
- T135836#2314683: facial landmarks
- audio fingerprinting and recognition
- @DrTrigon had a nice IRC chat with rillke, very supportive and inspiring:
- https://acoustid.org/fingerprinter
- https://bitbucket.org/acoustid/profile/repositories
- docker container and/or puppet script for vagrant (labs, VM e.g for win users)
- http://echoprint.me/
- https://github.com/spotify/echoprint-codegen (this could be show changer)
- https://github.com/AbdealiJK/file-metadata/issues/15
- Detect line drawings
- Detect pie charts
- Detect line charts
- Detect if SVG
- T134644: Categorize wikimedia logos
- T137558: Find a way to detect and mark/flag/tag errors in files and mediawiki software
- T138119: Use user-maintained bot run mode to gain stats and learn
- learning? how time consuming? (not to spend too much time on something that we cannot finish - though a actually should be quite easy to have a first theoretically working script)
- train the bot with images of persons we now in advance that they will appear in a dataset (e.g. generals or politicians during wars, olympic games, etc.)
- train the bot on the dataset itself at least after humans have gone over it
- Z441#5618: what happens if you take an image flandmark cannot detect, and some amount of random noise, resize and rotate it a bit and re-try - as if you were sitting in front of the cam and move and tilt your head a bit untill it get the detection
- https://pypi.python.org/pypi/tesserocr - detect images that are text actually
- fingerprint and recognize cameras by chip noise pattern instead of bad pixels
- include video2copyright project features:
Categories to assign (see https://etherpad.wikimedia.org/p/Zl7V7KuK7J):
- Category:Portraits -> size of face (ration compared to picture size - kind of coverage) and orientation (head pose)