Change Details

- DrTrigon (1. Development Phase around 2013) [ ] https://commons.wikimedia.org/wiki/User:DrTrigonBot/doc - contains old/most recent TODO list, original opencv bot proposal, etc. [ ] alternative to pHash (not developed since 2013): http://blockhash.io/ (or just create an icon by averaging over pixels, resp. reducing resolution/scale/zoom) [ ] wavlet decompositions for peak detection, color regions, fingerprinting/hashing, frequency decomp., denoise, compress, etc. - code/software: - http://www.pybytes.com/pywavelets/regression/wp2d.html (supports 2D data, mature, see WaveletPacket2D.get_leaf_nodes() and store as xml/json) - http://jseabold.net/blog/2012/02/23/wavelet-regression-in-python/ - literature/paper: - https://www.researchgate.net/post/How_wavelet_transform_coefficient_used_for_image_classification - http://www.cmapx.polytechnique.fr/~yu/publications/ICPR08Final.pdf <- **implement this as it supports object recognition, texture and satelite images classification, text/image language identification, sound classification** - patch transformation: http://people.csail.mit.edu/taegsang/Documents/CVPRPatch.pdf - http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.684.5988&rep=rep1&type=pdf - http://ac.els-cdn.com/S0377042706006431/1-s2.0-S0377042706006431-main.pdf?_tid=dca58f04-2708-11e6-9c23-00000aab0f27&acdnat=1464683149_3f76b534460f9c998182d86714c80597 (watermarking) - http://soundlab.cs.princeton.edu/publications/2001_amta_aadwt.pdf - **Text Classification by Aggregation of SVD Eigenvectors**: http://delab.csd.auth.gr/papers/ADBIS2012skm.pdf (might not be very useful... how many text do we need to categorize?) - Chapter 15 - BLIND SOURCE SEPARATION: http://www.mit.edu/~gari/teaching/6.555/LECTURE_NOTES/ch15_bss.pdf [ ] head pose estimation - see: http://rpg.ifi.uzh.ch/software_datasets.html (Perspective 3-Point (P3P) Algorithm) - http://rpg.ifi.uzh.ch/software/p3p_code_final.zip - http://rpg.ifi.uzh.ch/docs/CVPR11_kneip.pdf [ ] T137558: render error detection (see T136934) - using `convert` (ImageMagick) the commons default, allows to compare commons results against other libraries and e.g. find rendering errors, see https://github.com/AbdealiJK/file-metadata/issues/37 - possible categories to check for testing are: - https://commons.wikimedia.org/wiki/Category:PDF_files_affected_by_MediaWiki_restrictions - https://commons.wikimedia.org/wiki/Category:Images_without_thumbnails - https://commons.wikimedia.org/wiki/Category:Images_with_render_problem [ ] {T61499} - AbdealiJK, jayvdb, DrTrigon (2. Development Phase GSoC 2016) [ ] T135836#2314683, T135836#2314835: face recognition (e.g. like facebook) as well as age and gender - needs kind of DB (e.g. commons) [ ] T135836#2314683: facial landmarks [ ] audio fingerprinting and recognition - https://news.ycombinator.com/item?id=8303713 - http://willdrevo.com/fingerprinting-and-audio-recognition-with-python/ - https://github.com/worldveil/dejavu - @DrTrigon had a nice IRC chat with rillke, very supportive and inspiring: [ ] https://acoustid.org/fingerprinter [ ] https://bitbucket.org/acoustid/profile/repositories [ ] docker container and/or puppet script for vagrant (labs, VM e.g for win users) [ ] http://echoprint.me/ [ ] https://github.com/spotify/echoprint-codegen (this could be show changer) [ ] https://github.com/AbdealiJK/file-metadata/issues/15 - Detect line drawings - Detect pie charts - Detect line charts - Detect if SVG [ ] {T134644} [ ] {T137558} [ ] {T138119} [ ] learning? how time consuming? (not to spend too much time on something that we cannot finish - though a actually should be quite easy to have a first theoretically working script) [ ] train the bot with images of persons we now in advance that they will appear in a dataset (e.g. generals or politicians during wars, etc.) [ ] train the bot on the dataset itself at least after humans have gone over it [ ] Z441#5618: what happens if you take an image flandmark cannot detect, and some amount of random noise, resize and rotate it a bit and re-try - as if you were sitting in front of the cam and move and tilt your head a bit untill it get the detection [ ] https://pypi.python.org/pypi/tesserocr Categories to assign (see https://etherpad.wikimedia.org/p/Zl7V7KuK7J): - [[ https://commons.wikimedia.org/wiki/Category:Portraits | Category:Portraits ]] -> size of face (ration compared to picture size - kind of coverage) and orientation (head pose) -

- DrTrigon (1. Development Phase around 2013) [ ] https://commons.wikimedia.org/wiki/User:DrTrigonBot/doc - contains old/most recent TODO list, original opencv bot proposal, etc. [ ] alternative to pHash (not developed since 2013): http://blockhash.io/ (or just create an icon by averaging over pixels, resp. reducing resolution/scale/zoom) [ ] wavlet decompositions for peak detection, color regions, fingerprinting/hashing, frequency decomp., denoise, compress, etc. - code/software: - http://www.pybytes.com/pywavelets/regression/wp2d.html (supports 2D data, mature, see WaveletPacket2D.get_leaf_nodes() and store as xml/json) - http://jseabold.net/blog/2012/02/23/wavelet-regression-in-python/ - literature/paper: - https://www.researchgate.net/post/How_wavelet_transform_coefficient_used_for_image_classification - http://www.cmapx.polytechnique.fr/~yu/publications/ICPR08Final.pdf <- **implement this as it supports object recognition, texture and satelite images classification, text/image language identification, sound classification** - patch transformation: http://people.csail.mit.edu/taegsang/Documents/CVPRPatch.pdf - http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.684.5988&rep=rep1&type=pdf - http://ac.els-cdn.com/S0377042706006431/1-s2.0-S0377042706006431-main.pdf?_tid=dca58f04-2708-11e6-9c23-00000aab0f27&acdnat=1464683149_3f76b534460f9c998182d86714c80597 (watermarking) - http://soundlab.cs.princeton.edu/publications/2001_amta_aadwt.pdf - **Text Classification by Aggregation of SVD Eigenvectors**: http://delab.csd.auth.gr/papers/ADBIS2012skm.pdf (might not be very useful... how many text do we need to categorize?) - Chapter 15 - BLIND SOURCE SEPARATION: http://www.mit.edu/~gari/teaching/6.555/LECTURE_NOTES/ch15_bss.pdf [ ] head pose estimation - see: http://rpg.ifi.uzh.ch/software_datasets.html (Perspective 3-Point (P3P) Algorithm) - http://rpg.ifi.uzh.ch/software/p3p_code_final.zip - http://rpg.ifi.uzh.ch/docs/CVPR11_kneip.pdf [ ] T137558: render error detection (see T136934) - using `convert` (ImageMagick) the commons default, allows to compare commons results against other libraries and e.g. find rendering errors, see https://github.com/AbdealiJK/file-metadata/issues/37 - possible categories to check for testing are: - https://commons.wikimedia.org/wiki/Category:PDF_files_affected_by_MediaWiki_restrictions - https://commons.wikimedia.org/wiki/Category:Images_without_thumbnails - https://commons.wikimedia.org/wiki/Category:Images_with_render_problem [ ] {T61499} - AbdealiJK, jayvdb, DrTrigon (2. Development Phase GSoC 2016) [ ] T135836#2314683, T135836#2314835: face recognition (e.g. like facebook) as well as age and gender - needs kind of DB (e.g. commons) [ ] T135836#2314683: facial landmarks [ ] audio fingerprinting and recognition - https://news.ycombinator.com/item?id=8303713 - http://willdrevo.com/fingerprinting-and-audio-recognition-with-python/ - https://github.com/worldveil/dejavu - @DrTrigon had a nice IRC chat with rillke, very supportive and inspiring: [ ] https://acoustid.org/fingerprinter [ ] https://bitbucket.org/acoustid/profile/repositories [ ] docker container and/or puppet script for vagrant (labs, VM e.g for win users) [ ] http://echoprint.me/ [ ] https://github.com/spotify/echoprint-codegen (this could be show changer) [ ] https://github.com/AbdealiJK/file-metadata/issues/15 - Detect line drawings - Detect pie charts - Detect line charts - Detect if SVG [ ] {T134644} [ ] {T137558} [ ] {T138119} [ ] learning? how time consuming? (not to spend too much time on something that we cannot finish - though a actually should be quite easy to have a first theoretically working script) [ ] train the bot with images of persons we now in advance that they will appear in a dataset (e.g. generals or politicians during wars, etc.) [ ] train the bot on the dataset itself at least after humans have gone over it [ ] Z441#5618: what happens if you take an image flandmark cannot detect, and some amount of random noise, resize and rotate it a bit and re-try - as if you were sitting in front of the cam and move and tilt your head a bit untill it get the detection [ ] https://pypi.python.org/pypi/tesserocr - detect images that are text actually Categories to assign (see https://etherpad.wikimedia.org/p/Zl7V7KuK7J): - [[ https://commons.wikimedia.org/wiki/Category:Portraits | Category:Portraits ]] -> size of face (ration compared to picture size - kind of coverage) and orientation (head pose) -