Page MenuHomePhabricator

file-metadata testing log
Open, Needs TriagePublic

Description

This is the test protocol and log task for file-metadata. For summary of working instructions see https://commons.wikimedia.org/wiki/User:DrTrigon/file-metadata.

Setup a VirtualBox with osboxes.org Kubuntu_14.04.3-64bit.7z (see http://www.osboxes.org/kubuntu/), then boot the disk in VirtualBox (may be change of UUID needed), login and open 'Term'/Konsole:

$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get autoremove
$ sudo apt-get install python-pip

insert guest additions medium in virtualbox

$ cd /media/osboxes/VBOXADDITIONS_4.3.36_105129
$ sudo ./VBoxLinuxAdditions.run
$ sudo shutdown -r now

enable bidirectional clipboard

In order to not have to do this step everytime again when resetting, exported the VM as 'Wikimedia GSoC Testing.ova' (OVF 1.0 including manifest) and created a snapshot 'plain/unused'. Both can be used to jump back to a fresh (resetted) machine.

For the installation process see: https://commons.wikimedia.org/wiki/User:DrTrigon/file-metadata

Event Timeline

This comment was removed by DrTrigon.
osboxes@osboxes:~$ pip install file-metadata
Downloading/unpacking file-metadata
  Could not find a version that satisfies the requirement file-metadata (from versions: 0.0.1.dev20160603144066, 0.0.1.dev20160314064516, 0.0.1.dev20160603144056)
Cleaning up...
No distributions matching the version for file-metadata
Storing debug log for failure in /home/osboxes/.pip/pip.log
osboxes@osboxes:~$
osboxes@osboxes:~$ pip install --pre file-metadata
Downloading/unpacking file-metadata
  Downloading file-metadata-0.0.1.dev20160603144066.tar.gz
  Running setup.py (path:/tmp/pip_build_osboxes/file-metadata/setup.py) egg_info for package file-metadata
    `exiftool` (http://www.sno.phy.queensu.ca/~phil/exiftool/) needs to be installed and needs to be made available in your PATH.
    Complete output from command python setup.py egg_info:
    `exiftool` (http://www.sno.phy.queensu.ca/~phil/exiftool/) needs to be installed and needs to be made available in your PATH.

----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /tmp/pip_build_osboxes/file-metadata
Storing debug log for failure in /home/osboxes/.pip/pip.log
osboxes@osboxes:~$

so appearently 'exiftool' was missing

osboxes@osboxes:~$ sudo apt-get install exiftool

then a similar message as before but for 'OpenCV' appeared

osboxes@osboxes:~$ sudo apt-get install python-opencv
osboxes@osboxes:~$ pip install --pre file-metadata
Downloading/unpacking file-metadata
  Downloading file-metadata-0.0.1.dev20160603144066.tar.gz
  Running setup.py (path:/tmp/pip_build_osboxes/file-metadata/setup.py) egg_info for package file-metadata
    java version "1.7.0_79"
    OpenJDK Runtime Environment (IcedTea 2.5.6) (7u79-2.5.6-0ubuntu1.14.04.1)
    OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)
    error in file-metadata setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers
    Complete output from command python setup.py egg_info:
    java version "1.7.0_79"

OpenJDK Runtime Environment (IcedTea 2.5.6) (7u79-2.5.6-0ubuntu1.14.04.1)

OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

error in file-metadata setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers

----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /tmp/pip_build_osboxes/file-metadata
Storing debug log for failure in /home/osboxes/.pip/pip.log
osboxes@osboxes:~$

here I am stuck since I don't see what the issue is exactly:

  • is java version "1.7.0_79" needed or the wrongly installed one?
  • what do I have to install? (how?)

What is the actual bot-script?

@DrTrigon Yeah, it's all in bulk_test - I tried making a "real" bot script yesterday, but after the meeting yesterday, we decided that the functionality was not needed right now, so abandoned it.

I thought I'd implement it when it's needed, as dead code which isn't being used right now, just complicates things (And requires refactoring when things change in file-metadata).

Agree. (If you write one it should be not much more than the basic bot script example, very simple.)

$ sudo apt-get install python-dev
$ sudo python -m pip install --ignore-installed cython
$ sudo apt-get install pkg-config
$ sudo apt-get install libfreetype6-dev libpng12-dev
$ sudo apt-get install liblapack-dev libblas-dev
$ sudo apt-get install gfortran
$ sudo apt-get install cmake
$ sudo apt-get install libboost-python-dev
$ sudo apt-get install liblzma-dev
$ sudo pip install --pre file-metadata
$ python -c'import file_metadata; print file_metadata.version'
0.0.1.dev99999999999999

installed file-metadata on a Kubuntu VirtualBox machine/guest:

osboxes@osboxes:~$ uname -a
Linux osboxes 3.19.0-25-generic #26~14.04.1-Ubuntu SMP Fri Jul 24 21:16:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
osboxes@osboxes:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.4 LTS
Release: 14.04
Codename: trusty

comments about using a virtualenv to install file-metadata:

[18:12:56] <AbdealiJK> it normally is a good idea, as otherwise the things installed with apt-get may clash with pip <- and stuff gets confusing
[18:13:18] <AbdealiJK> But we can continue without.

[18:18:52] <AbdealiJK> Because youre not using a virtualenv you need sudo for all pip commands
[18:19:05] <DrTrigon> my mistake... ;) again...
[18:19:11] <AbdealiJK> as pip installs stuff in /usr/local/lib... which requires sudo to write to

pip does not contain compiled code (e.g. scipy needing fortran, etc. is compiled every time when installed), for that use http://pythonwheels.com/ (what was 'eggs' in the old days)

[19:24:00] <AbdealiJK> DrTrigon_, the package uploader can make binaries for their package. But the binary may not be for your system, hence it compiles in such cases
[19:24:24] <DrTrigon_> kubuntu is not uncommon, right?
[19:24:25] <jayvdb> ^ the binary wheel format, a Python spec, not a pip thing
[19:24:36] <AbdealiJK> DrTrigon_, For example, the dlib package owner only makes binaries for windows

(on the other hand, the fact that pip compiles code is good for us)

continue by running the tests, like:

[19:46:44] <DrTrigon_> AbdealiJK: How to run the test now? bulk_test.py?
[19:47:16] <AbdealiJK> DrTrigon_, You need the test-requirements.txt for that (from git) - it has the same test requirements as pywikibot, so jayvdb has them installed already
[19:48:01] <AbdealiJK> pip install pytest pytest-env mock pytest-timeout - should be enough I think

[19:49:42] <AbdealiJK> DrTrigon_, Nope, sorry for being vague on details. pip install -r test-requirements.txt and then python -m pytest
[19:50:26] <DrTrigon_> ...with the test-requirements.txt from github?

setup the test environment, e.g. in VirtualBox home directory:

$ sudo pip install pytest pytest-env mock pytest-timeout
$ wget https://raw.githubusercontent.com/AbdealiJK/file-metadata/master/test-requirements.txt
$ sudo pip install -r test-requirements.txt

~$ sudo apt-get install git
~$ sudo pip install pytest pytest-env mock pytest-timeout
~$ git clone https://github.com/AbdealiJK/file-metadata
~$ cd file-metadata/
~/file-metadata$ wget https://raw.githubusercontent.com/AbdealiJK/file-metadata/master/test-requirements.txt
~/file-metadata$ sudo pip install -r test-requirements.txt
~/file-metadata$ python -m py.test
[...]

4 failed, 40 passed, 18 skipped, 5 error in 4.28 seconds

~/file-metadata$ sudo apt-get install libmagickwand-dev
~/file-metadata$ sudo apt-get install python-bs4
~/file-metadata$ sudo pip install colormath
~/file-metadata$ python -m py.test
[...]

1 failed, 69 passed, 11 skipped in 115.72 seconds

Result / Conclusions:

  • 1 failed: something wrong with the color average for green.png, see below (T136985#2360554)
  • libmagickwand-dev missing: the command to install was correctly stated in the error message -> cool, thanks! made me happy!
  • python-bs4 missing: no command was mentioned (made me unhappy ;)
  • colormath missing: needs to be done by pip during install (has to be added to requirements.txt, right?)

check packages etc.:

$ python -c'import site; print site.getsitepackages()'
['/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages']

search for packages:

$ apt-cache search [package-name, e.g. 'python-bs4']

$ pip search [package-name, e.g. 'colormath']

osboxes@osboxes:~/file-metadata/tests/image$ python -m py.test -vv

test session starts

platform linux2 -- Python 2.7.6, pytest-2.9.2, py-1.4.31, pluggy-0.3.1 -- /usr/bin/python
cachedir: ../../.cache
rootdir: /home/osboxes/file-metadata, inifile: setup.cfg
plugins: env-0.6.0, timeout-1.0.0, attrib-0.1.3, xdist-1.14, cov-2.2.1
collected 26 items

image_file_test.py::ImageFileTest::test_ndarray_read PASSED
image_file_test.py::ImageFileColorAverageTest::test_color_average_animated_image PASSED
image_file_test.py::ImageFileColorAverageTest::test_color_average_greyscale_image PASSED
image_file_test.py::ImageFileColorAverageTest::test_color_average_rgb_image FAILED
image_file_test.py::ImageFileColorAverageTest::test_color_average_rgba_image PASSED
image_file_test.py::ImageFileColorAverageTest::test_color_average_unknown_format PASSED
image_file_test.py::ImageFileFaceLandmarksTest::test_facial_landmarks_animated_image PASSED
image_file_test.py::ImageFileFaceLandmarksTest::test_facial_landmarks_baby_face PASSED
image_file_test.py::ImageFileFaceLandmarksTest::test_facial_landmarks_mona_lisa PASSED
image_file_test.py::ImageFileFaceLandmarksTest::test_facial_landmarks_monkey_face PASSED
image_file_test.py::ImageFileBarcodesTest::test_barcode PASSED
image_file_test.py::ImageFileBarcodesTest::test_dmtx PASSED
image_file_test.py::ImageFileBarcodesTest::test_mona_lisa PASSED
image_file_test.py::ImageFileBarcodesTest::test_multiple_barcodes PASSED
image_file_test.py::ImageFileBarcodesTest::test_qrcode PASSED
jpeg_file_test.py::JPEGFileTest::test_filename_zxing PASSED
jpeg_file_test.py::JPEGFileBarcodesTest::test_jpeg_cmyk PASSED
jpeg_file_test.py::JPEGFileBarcodesTest::test_jpeg_qrcode PASSED
svg_file_test.py::SVGFileTest::test_fetch_svg_ndarray PASSED
svg_file_test.py::SVGFileTest::test_fetch_svg_ndarray_application_xml PASSED
svg_file_test.py::SVGFileTest::test_fetch_svg_ndarray_text_html PASSED
svg_file_test.py::SVGFileTest::test_fetch_svg_ndarray_text_plain PASSED
xcf_file_test.py::XCFFileTest::test_fetch_filename_raster PASSED
xcf_file_test.py::XCFFileTest::test_fetch_ndarray PASSED
xcf_file_test.py::XCFFileFaceLandmarksTest::test_blank PASSED
xcf_file_test.py::XCFFileFaceLandmarksTest::test_facial_landmarks_woman_face PASSED

FAILURES

________________ ImageFileColorAverageTest.test_color_average_rgb_image ________________

self = <tests.image.image_file_test.ImageFileColorAverageTest testMethod=test_color_average_rgb_image>

def test_color_average_rgb_image(self):
    data = ImageFile(fetch_file('red.png')).analyze_color_average()
    self.assertEqual(data['Color:AverageRGB'], (255, 0, 0))
    self.assertEqual(data['Color:ClosestLabeledColor'],
                     'PMS 17-1462 TPX (Flame)')
    self.assertEqual(data['Color:ClosestLabeledColorRGB'], (244, 81, 44))

    data = ImageFile(fetch_file('green.png')).analyze_color_average()
self.assertEqual(data['Color:AverageRGB'], (0, 255, 0))

E AssertionError: Tuples differ: (255.0, 0.0, 0.0) != (0, 255, 0)
E
E First differing element 0:
E 255.0
E 0
E
E - (255.0, 0.0, 0.0)
E + (0, 255, 0)

image_file_test.py:32: AssertionError

1 failed, 25 passed in 11.99 seconds

Here the current full output, sorry I cannot figure out how to list the ones skipped before.

Wow, I'm happy that nearly all the tests passed !
I recall having this exact same issue yesterday morning and I remember fixing it too. I think the dev999... version had that issue, and can be ignored for now.

By the way, the skipped tests are probably because of these reasons:

  • bulk_tests.py will be skipped because you dont have pywikibot installed, not do you have a user-config set. This is by default, because otherwise you would be downloading 100,000 images ^^
  • Your system probably won't have both LibAv and FFMPEG, because of that, one of those tests will get skipped.
  • Your system will not have python-magic from Ubuntu. Hence, that test will be skipped too (It will use the python-magic from pypi)

You can get more info about skipped tests by adding the -r s arg to pytest.

\o/

So that means the seg fault is my own special problem to solve ;-)

Result / Conclusions:

  • 1 failed: something wrong with the color average for green.png, see below (T136985#2360554)
  • libmagickwand-dev missing: the command to install was correctly stated in the error message -> cool, thanks! made me happy!

Yep, I have made an issue https://github.com/dahlia/wand/issues/293 to fail at setup.py itself.

  • python-bs4 missing: no command was mentioned (made me unhappy ;)

This is interesting, nothing needs bs4 as far as I know. Will test.

  • colormath missing: needs to be done by pip during install (has to be added to requirements.txt, right?)

This is weird. file-metadata has pycolorname in the requirements.txt and pycolorname has colormath in it's requirements. So, this should have been installed. Will check.

May be due to the numerous errors we had while installing and setting up. I could just dump the VM and give it another try.

Wow, I'm happy that nearly all the tests passed !

Nod. Thanks for your work! :)

I recall having this exact same issue yesterday morning and I remember fixing it too. I think the dev999... version had that issue, and can be ignored for now.

Is it possible to test a more recent version? Or is this too early? Anything else I can and should do? ;)

By the way, the skipped tests are probably because of these reasons:

  • bulk_tests.py will be skipped because you dont have pywikibot installed, not do you have a user-config set. This is by default, because otherwise you would be downloading 100,000 images ^^
  • Your system probably won't have both LibAv and FFMPEG, because of that, one of those tests will get skipped.
  • Your system will not have python-magic from Ubuntu. Hence, that test will be skipped too (It will use the python-magic from pypi)

Next step would be to install pywikibot and ffmpeg and run the test again?

You can get more info about skipped tests by adding the -r s arg to pytest.

Tried that... May be I was looking in the wrong place...

\o/

So that means the seg fault is my own special problem to solve ;-)

Yes, sorry for not helping but on the other hand I am happy. I have mixed feelings about that... ;)

In my experience fedora can be very complicated about that, even more on a loaded system (with any kind of stuff).
As I appreciate the testing in fedora (the system I developped catimages.py), have you tried it in a VM on a freshly new installed guest?
http://www.osboxes.org/fedora/

@DrTrigon You don't need to test the ffmpeg nor pywikibot. Travis does those to ensure that the tools are being handled correctly..

They are alternatives (i.e. if You don't have libav installed but ffmpeg is installed, ffmpeg is used).

I think I'll try adding more features, etc before we do another testing run :D That was really tiring !

:)))) Thanks again for your work! I'm just curious and want to see some "bot code" run, so I'll play a bit. No need for you to assist here... ;)

osboxes@osboxes:~/file-metadata/tests/image$ sudo pip install pywikibot --pre
osboxes@osboxes:~/file-metadata/tests/image$ python pwb.py basic
python: can't open file 'pwb.py': [Errno 2] No such file or directory

so I guess I need the git repo as well (like with file-metadata) ... but then it does not find user-config.py from file-metadata directory ... ?

osboxes@osboxes:~/file-metadata/tests/image$ sudo pip install pywikibot
Downloading/unpacking pywikibot
  Could not find a version that satisfies the requirement pywikibot (from versions: 2.0rc3, 2.0rc4, 2.0rc1.post2, 2.0rc1.post1)
Cleaning up...
No distributions matching the version for pywikibot
Storing debug log for failure in /home/osboxes/.pip/pip.log

does not work for me at the moment, see: T67176#2367132

From fresh/resetted machine, according to https://commons.wikimedia.org/wiki/User:AbdealiJK/file-metadata:

osboxes@osboxes:~$ sudo apt-get install perl openjdk-7-jre python-dev pkg-config libfreetype6-dev libpng12-dev liblapack-dev libblas-dev gfortran cmake libboost-python-dev liblzma-dev libjpeg-dev
osboxes@osboxes:~$ sudo pip install numpy
osboxes@osboxes:~$ sudo pip install file-metadata

run into error/issue given in: http://dpaste.com/3RZ6Y1T

So I see 2 "issues":

  1. on https://commons.wikimedia.org/wiki/User:AbdealiJK/file-metadata we need to add $ pip install numpy before $ pip install file-metadata - actually pip should take care of this automatically, is this an issue with the system package management? (I checked, python-numpy was not installed)
  2. $ pip install file-metadata crashes due to pkg_resources.DistributionNotFound: cython>=0.21

Issue 1:
Hm. So, the first issue is surprising. pypi normally does not guarantee the order of installation of packages.
So, if pkgA needs pkgB for compilation, this can be a problem.
Could you tell me which package gave an error about numpy not installed so I can investigate further ?

Issue 2:
This is the issue we noticed earlier about cython from the system being preinstalled, hence making the cython from pypi confused. The solution to this was to (1) make a virtualenv OR (2) uninstall the cython system package

Could you add the issue 2 as a note to the wiki page at https://commons.wikimedia.org/wiki/User:AbdealiJK/file-metadata ? Or possibly a new section "Installation FAQ" or so ?

@DrTrigon The numpy issue, I think the error message is quite clear.

This error would not happen most of the time, because scikit-image does have wheels for most systems, and hence will not have to be compiled every time.

Hence, I think it's better not to include it in the installation step, as the error message is clear + it should not happen majority of the time.

All decent Linux distros include python packages for many of these dependencies, and OSX has brew which does the same, so we should not install it for them, nor should we assist them to install new versions from pypi if their OS already provides the desired version. The user *should* use apt/dnf/brew/etc. on most platforms, for these.

See https://commons.wikimedia.org/w/index.php?title=User:AbdealiJK/file-metadata&diff=199557008&oldid=199556866 , and please expand it to include other dependencies

@jayvdb: ....now I am a bit unhappy since somehow we seam to run into this package installation mess 'external' was supposed to solve again. Especially from your mail I feel the need to insist on creating a pythonic solution for "vendored" deps. Have you ever used vistrails? In my opinion it is very pythonic:

  1. simple installation, small footprint
  2. automatically detects missing deps
  3. cleverly finds and installs the right package on user wish and program need

that is what I want to see in pywikibot (now we are back to discussions from 2014 ;)) and python in general. What about another GSoC for next year?

To be very clear on that: IMHO pip and virtualenv are not needed at all! I never saw a real use for it - I am happy to use and work with it for this project, but if everything would be coded and separated correctly any python package would work out of the box from any folder. So every python application could just use the system packages for uncritical stuff and for critical stuff use an own version in its own folder. That was the idea behind externals and how it used to work.

I really feel I want to solve that in a proper python package! I don't want to have the same issue again in 2020 ;)))

@AbdealiJK: (nod) How this this wheels stuff work? Will we have that at some point?

no I havent explored VisTrails in detail. I will. But as I mentioned on T138140, vendoring does have serious cons on Python. Did you see the bugs I linked to? Yes, there is possibly a project to simplify the vendoring processes on Python correctly, and assist in ongoing maintenance of vendored libraries. I am not a fan of pip at all, and would prefer to have a vendored solution, but it doesnt currently exist. Also it would be outside the scope of Wikimedia Foundation to build that. However the Python Foundation is also part of GSOC, so it might be possible to run a project through them, or jointly with them, but the Python Foundation mostly sponsors projects for improving their own existing code (not anything like pip), and in my experience the key people in the Python community who would need to be on board would also be very opposed to anyone else running such a project.

But that debate and future possibility is not relevant to this GSOC project, as it is core requirement of that project (as described at T129611) to use pip(pypi) where possible, and using OS provided dependencies is an improvement on that as it allows build-only dependencies and build-only steps to be eliminated from the user concerns. e.g. cython to compile packages is not needed if the OS provides pre-compiled versions. And Q.A. and support has been done by the provider (pypi or OS), which reduces the maintenance burden on the pywikibot/pywikibot-catimages teams.

@DrTrigon wheels are like pre-compiled versions of a python package. As file-metadata does not need any compilation, it isnt very needed for our package. But our dependencies that have C code (skimage, numpy, scipy, dlib, matplotlib, wand) should normally have wheels to make it easy to install.

I've modified the instructions at the readme page quite a bit. https://commons.wikimedia.org/wiki/User:AbdealiJK/file-metadata I hope this reduces confusion.

@AbdealiJK: Sorry, may be I should have made my question more clear; How can I use wheels to install file-metadata at the moment?

@DrTrigon There is no special step involved for installing with wheels. pip install <pkg name> generally prefers using wheels if a wheel is found that is suitable for your system.
If a wheel suitable for your system is not found, it compiles it from source.

For example, on Travis, it uses the scikit wheel (.whl file) from pypi - Line 632 on the build 138811509

The specific wheel it uses on travis is scikit_image-0.12.3-cp27-cp27mu-manylinux1_x86_64.whl. Which means (Got this information from https://github.com/pypa/manylinux):

  • cp27 means CPython 2.7
  • m in cp27mu means that pymalloc is available
  • u in cp27mu means the python was compiled with UCS4 unicodes
  • x86_64 - 64bit (or arch 64 or x64) computers only

Ah.
I just realized that in your case, the reason the wheel is not being used is because you have a really old version of pip which does not support wheels.

pip install -U pip can be used to upgrade pip. But again, I'd recommend it be done inside a virtualenv ... (Otherwise apt-get's python-pip may cause confusion)

Test through pip:

$ sudo apt-get update
$ sudo apt-get purge python-pip; sudo apt-get autoremove
$ wget https://bootstrap.pypa.io/get-pip.py; sudo python get-pip.py
$ pip show pip
---
Metadata-Version: 2.0
Name: pip
Version: 8.1.2
[...]
$ sudo apt-get install perl openjdk-7-jre python-dev pkg-config libfreetype6-dev libpng12-dev liblapack-dev libblas-dev gfortran cmake libboost-python-dev liblzma-dev libjpeg-dev python-virtualenv
$ sudo pip install file-metadata
$ python -c'import file_metadata; print file_metadata.__version__'
0.1.0
$ sudo pip install pywikibot
$ wget https://gist.githubusercontent.com/AbdealiJK/a94fc0d0445c2ad715d9b1b95ec2ba03/raw/492ef4076d5af74b4855fd26f6810f14cff07ec9/file_metadata_bot.py
$ python pwb.py
python: can't open file 'pwb.py': [Errno 2] No such file or directory
  1. your hint regarding upgrading pip is correct, however the command not; you need to de-install and manually re-install it - see http://stackoverflow.com/questions/28917534/pip-broken-on-ubuntu-14-4-after-package-upgrade
  2. works outside virtualenv as well
  3. it uses wheels now ;))

Also modified the install instructions page a bit again: https://commons.wikimedia.org/w/index.php?title=User%3AAbdealiJK%2Ffile-metadata&type=revision&diff=199645641&oldid=199561344

How to call pwb.py? See T67176#2398644?

There shouldnt be any need for pwb.py .. It only adds a few UI
simplifications for frequent usage.
If the script.py is explictly invoked, and user-config.py created or
avoided with the env var , why else is it needed?

Ok, let me ask differently; How to create user-config.py?

Test through system package management:

$ sudo apt-get update
$ sudo apt-get purge python-pip; sudo apt-get autoremove
$ wget https://bootstrap.pypa.io/get-pip.py; sudo python get-pip.py
$ pip show pip
---
Metadata-Version: 2.0
Name: pip
Version: 8.1.2
[...]
$ sudo apt-get install python-appdirs python-magic python-numpy python-scipy python-matplotlib python-wand python-skimage python-zbar
$ sudo apt-get install cmake libboost-python-dev liblzma-dev
$ sudo pip install file-metadata
$ python -c'import file_metadata; print file_metadata.__version__'
0.1.0
  1. do not need to install: python-pip, python-setuptools (due to manual installing pip)
  2. need to install additional packages: cmake, libboost-python-dev, liblzma-dev (for dlib compilation)
  3. dlib gets compiled during pip install
    • @AbdealiJK: do you know why? do you want/need more verbose output for that?

Ok, despite some changes in packages that need to be installed etc. - I conclude: the installation of file-metadata is robust now! Works purely from pip or via package management. That looks very nice and is very cool!

Now I need to get user-config.py set-up and then I can work with and run the bot code.

config2.py doesnt need a user-config.py if an envvar is set before import.
An app can use that, and modify pywikibot.config other ways.

Or the old fashioned way .. document steps the user follows to create the
user-config.py , or the app can provide their own init script to create it.

Can you give me some examples, please? I need to see this...

My situation is: I did a pip install pywikibot. Now I want to run any bot script for a test whether everything is setup correctly. What is the way of doing that?

After having done

$ sudo pip install pywikibot

What is the next command to execute to test it? What is the command to execute a simple example bot script?

  1. install from package management according to T136985#2398708 OR pip by T136985#2398652 (w/o pywikibot)
  2. DO NOT install pywikibot through pip (this is just doubling code and causing confusion)
  3. install pywikibot FROM GIT according to Manual:Pywikibot/Gerrit
$ sudo apt-get install git git-review
$ git clone --branch 2.0 --recursive https://gerrit.wikimedia.org/r/pywikibot/core.git
$ cd core/
$ wget https://gist.githubusercontent.com/AbdealiJK/a94fc0d0445c2ad715d9b1b95ec2ba03/raw/492ef4076d5af74b4855fd26f6810f14cff07ec9/file_metadata_bot.py
$ python pwb.py basic
NOTE: 'user-config.py' was not found!
Please follow the prompts to create it:

Your default user directory is "/home/osboxes/core"
Do you want to use that directory? ([Y]es, [n]o): 
Do you want to copy user files from an existing Pywikibot installation? ([y]es, [N]o): 
Create user-config.py file? Required for running bots. ([Y]es, [n]o): 
 1: anarchopedia
 2: battlestarwiki
 3: commons
 4: i18n
 5: incubator
 6: lyricwiki
 7: mediawiki
 8: meta
 9: omegawiki
10: osm
11: outreach
12: species
13: strategy
14: test
15: vikidia
16: wikia
17: wikiapiary
18: wikibooks
19: wikidata
20: wikimedia
21: wikinews
22: wikipedia
23: wikiquote
24: wikisource
25: wikitech
26: wikiversity
27: wikivoyage
28: wiktionary
29: wowwiki
Select family of sites we are working on, just enter the number or name (default: wikipedia): 3
The only known language: commons
The language code of the site we're working on (default: commons): 
Username on commons:commons: DrTrigon        
Do you want to add any other projects? ([y]es, [N]o): 
Would you like the extended version of user-config.py, with explanations included? ([Y]es, [n]o): 
'/home/osboxes/core/user-config.py' written.
Create user-fixes.py file? Optional and for advanced users. ([y]es, [N]o): 
  1. beta test the bot, to make sure it is working in the first place, run
$ python file_metadata_bot.py -cat:SVG_files -limit:5
Traceback (most recent call last):
  File "file_metadata_bot.py", line 13, in <module>
    from file_metadata.image.image_file import ImageFile
  File "/usr/local/lib/python2.7/dist-packages/file_metadata/image/image_file.py", line 28, in <module>
    warnings.simplefilter('error', Image.DecompressionBombWarning)
AttributeError: 'module' object has no attribute 'DecompressionBombWarning'
<type 'exceptions.AttributeError'>
CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort

@AbdealiJK:

  • We need clear and easy instruction on how to install pywikibot and configure it to use it along with file-metadata and the testing bot script. If I do get confused I'm pretty sure beta tester will as well... ;)
  • I got an error for ImageFile (see above) - does not matter whether I install through package management or pip

@jayvdb: Had issues installing pywikibot, the following seams not to work anymore - what is the workflow now-a-days? How can I get a repo from with I can push to gerrit as well?

$ git clone --recursive ssh://drtrigon@gerrit.wikimedia.org:29418/pywikibot/core.git
Cloning into 'core'...
ssh: connect to host gerrit.wikimedia.org port 29418: Network is unreachable
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Yeah, I think the pywikibot thing is a bit confusing too. I assumed pywikibot veterans would be trying that. I'd like @jayvdb 's opinion on how a script not inside /pywikibot/scripts is supposed to use pywikibot. Is there an example project I could look at ?

(Currently import pywikibot assumes a user-config.py exists ... and is created by the user)

The error in ImageFile is documented at https://github.com/AbdealiJK/file-metadata/issues/33 and fixed with a better warning message in master (Not fixed on v0.1.0 which is in pypi though)

I tried to use the github file-metadata by doing:

$ sudo apt-get install python-matplotlib python-zbar
$ git clone https://github.com/AbdealiJK/file-metadata.git
$ ln -s file-metadata/file_metadata file_metadata
$ python -c'import file_metadata; print file_metadata.__version__'
0.1.0.dev99999999999999
$ python file_metadata_bot.py -cat:SVG_files -limit:5
Traceback (most recent call last):
  File "file_metadata_bot.py", line 13, in <module>
    from file_metadata.image.image_file import ImageFile
  File "/home/osboxes/core/file_metadata/image/image_file.py", line 34, in <module>
    warnings.simplefilter('error', Image.DecompressionBombWarning)
AttributeError: 'module' object has no attribute 'DecompressionBombWarning'
<type 'exceptions.AttributeError'>
CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort

May be it messes something up because I am not using virtualenv?

Ouch, sorry. just read the error message properly. It seems the Pillow version is really old ? (It doesn't seem to have a warning called DecompressionBombWarning). This was added in Pillow 2.5.0 according to release notes, I'll adding a minimum version in the requirements.txt.

Also, Note that the latest version of file-metadata in git will not be compatible with the file_metadata_bot.py as analyze_barcode() has been removed in favour of the two new functions analyze_barcode_zxing() and analyze_barcode_zbar()

@AbdealiJK: Thanks for the info - given I would like to update the bot script, what is the workflow on github? clone, modify, commit and then push or how do you work? never worked on github till now... ;)

First successfull beta-test from my side:

  1. I had to use $ sudo pip install file-metadata --upgrade in order to get the most recent version of pillow (is this the expected behaviour? or was my try to early, before requirements.txt was updated?)
  2. Run then gave me a nice error message:
$ python file_metadata_bot.py -cat:SVG_files -limit:5
Retrieving 5 pages from commons:commons.
Traceback (most recent call last):
  File "file_metadata_bot.py", line 103, in <module>
    main(*sys.argv)
  File "file_metadata_bot.py", line 97, in main
    log = handle_page(page)
  File "file_metadata_bot.py", line 66, in handle_page
    _file = GenericFile.create(file_path)
  File "/usr/local/lib/python2.7/dist-packages/file_metadata/generic_file.py", line 76, in create
    return ImageFile.create(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/file_metadata/image/image_file.py", line 56, in create
    from file_metadata.image.svg_file import SVGFile
  File "/usr/local/lib/python2.7/dist-packages/file_metadata/image/svg_file.py", line 9, in <module>
    import wand.image
  File "/usr/local/lib/python2.7/dist-packages/wand/image.py", line 20, in <module>
    from .api import MagickPixelPacket, libc, libmagick, library
  File "/usr/local/lib/python2.7/dist-packages/wand/api.py", line 206, in <module>
    'Try to install:\n  ' + msg)
ImportError: MagickWand shared library not found.
You probably had not installed ImageMagick library.
Try to install:
  apt-get install libmagickwand-dev
<type 'exceptions.ImportError'>
CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort
  1. $ sudo apt-get install libmagickwand-dev
  2. Run then gave me:
$ python file_metadata_bot.py -cat:SVG_files -limit:5
Retrieving 5 pages from commons:commons.
==== [[:File:"Single"Japanese castle Tenshu layout format.svg]] ====
* '''Mimetype:''' image/svg+xml
Time taken to analyze: 1.33984sec

==== [[:File:9th Odia Wikipedia Anniversary.svg]] ====
* '''Mimetype:''' image/svg+xml
Time taken to analyze: 0.968834sec

==== [[:File:A Dine Fork.svg]] ====
* '''Mimetype:''' image/svg+xml
Time taken to analyze: 0.926564sec

==== [[:File:A conceptual drawing of a barge with very simple geometry.svg]] ====
* '''Mimetype:''' image/svg+xml
Time taken to analyze: 1.023462sec

Traceback (most recent call last):
  File "file_metadata_bot.py", line 103, in <module>
    main(*sys.argv)
  File "file_metadata_bot.py", line 97, in main
    log = handle_page(page)
  File "file_metadata_bot.py", line 68, in handle_page
    txt.append('==== {0} ===='.format(page.title(asLink=True, textlink=True)))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 13: ordinal not in range(128)
VERBOSE:pywiki:Dropped throttle(s).
<type 'exceptions.UnicodeEncodeError'>
CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort
CRITICAL:pywiki:Waiting for 1 network thread(s) to finish. Press ctrl-c to abort
VERBOSE:pywiki:All threads finished.

Opened 2 issues:

Actually I haven't updated the requirements.txt on the pypi version or in master yet. Not sure what the issue was, but it seems like pip resolved it.

To edit the code, you have 2 options:

Method 1: For most contributors

  1. You fork the repository into your own account.
  2. Clone your fork, and create a branch if you'd like to modify code in a branch (That's normally easier later down the line)
  3. Modify the code, commit it, (The normal)
  4. Push the code to your fork (to the appropriate branch)
  5. Create a Pull Request to my repository's master branch using your fork's appropriate branch

Method 2: I've given you and jayvdb write acess to the repository. So you can directly push code to master if you like, but I'd suggest you not do that because I'd rather review the code before it get pushed to master (or else there's bound to be a confusion).

Here, the process would be:

  1. Clone the AbdealiJK/file-metadata repository
  2. Create a new branch, Preferably prefixed with your name/ some prefix specific to you (I normally use ajk for my branches) and add a / to separate the prefix from the branch name. (For example, ajk/travis would have code written by me related to travis, etc.
  3. Modify, commit and push to your branch, (the normal)
  4. Create a Pull Request with your branch which asks to merge the branch with master.

Possibly, the following are good references:

DrTrigon reopened this task as Open.
DrTrigon claimed this task.

Thanks for all your help, my prefered way of installation and testing is documented at: https://commons.wikimedia.org/wiki/User:DrTrigon/file-metadata

It works by using the script from: https://gist.github.com/drtrigon/2dcbc5fbac1e00f0f89dec9343994e48

$ wget https://gist.githubusercontent.com/AbdealiJK/a94fc0d0445c2ad715d9b1b95ec2ba03/raw/1dcd1fb8c168608c28e20ff50e9284700f61b90d/file_metadata_bot.py
$ wget https://gist.githubusercontent.com/drtrigon/0002517ea812cc707e6ea2ecaf23d9b3/raw/84bee2be6ba918e0c86cadafe7c40d31a4da30f8/file_metadata_bot.diff; patch -p1 < file_metadata_bot.diff
$ python file_metadata_bot.py -cat:SVG_files -limit:5
Retrieving 5 pages from commons:commons.
==== [[:File:"Single"Japanese castle Tenshu layout format.svg]] ====
* '''Mimetype:''' image/svg+xml
Time taken to analyze: 0.667123sec

==== [[:File:9th Odia Wikipedia Anniversary.svg]] ====
* '''Mimetype:''' image/svg+xml
Time taken to analyze: 0.571316sec

==== [[:File:A Dine Fork.svg]] ====
* '''Mimetype:''' image/svg+xml
Time taken to analyze: 0.646316sec

==== [[:File:A conceptual drawing of a barge with very simple geometry.svg]] ====
* '''Mimetype:''' image/svg+xml
Time taken to analyze: 0.721252sec

==== [[:File:A Gonçalves Dias (cursive).svg]] ====
* '''Mimetype:''' image/svg+xml
Time taken to analyze: 0.275983sec

VERBOSE:pywiki:Dropped throttle(s).
VERBOSE:pywiki:Waiting for 1 network thread(s) to finish. Press ctrl-c to abort
VERBOSE:pywiki:All threads finished.

$ python pwb.py listpages -search:'eth-bib'

$ python file_metadata_bot.py -search:'eth-bib' -limit:5
Retrieving 5 pages from commons:commons.
==== [[:File:Alfred Werner ETH-Bib Portr 09965.jpg]] ====
* '''Mimetype:''' image/jpeg
Time taken to analyze: 9.570139sec

==== [[:File:Augusto Gansser ETH-Bib Dia 022-005.jpg]] ====
* '''Mimetype:''' image/jpeg
Time taken to analyze: 1.007276sec

==== [[:File:Aurel Stodola ETH-Bib Portr 06960.jpg]] ====
* '''Mimetype:''' image/jpeg
Time taken to analyze: 1.065649sec

==== [[:File:Aurel Stodola ETH-Bib Portr 09556.jpg]] ====
* '''Mimetype:''' image/jpeg
Time taken to analyze: 0.94153sec

==== [[:File:ERMETH ETH-Bib Ans 00290.jpg]] ====
* '''Mimetype:''' image/jpeg
Time taken to analyze: 0.954107sec

VERBOSE:pywiki:Dropped throttle(s).
VERBOSE:pywiki:Waiting for 1 network thread(s) to finish. Press ctrl-c to abort
VERBOSE:pywiki:All threads finished.

@AbdealiJK: How can I run and produce similar output as you do from travis, like face detection etc.? Is this possible?

@DrTrigon you can find the code for that at https://github.com/AbdealiJK/file-metadata/blob/95cc2abb3506608266b1faf0da0722433ad6b03b/tests/bulk.py
Note that it has some extra args:

  • -logname - The logname to write to. Used as -logname:Some_Name which tells it to write to User:<LoggedInUserInUserConfig>/logs/Some_Name.
  • -dryrun - Print the log rather than writing to the userspace. needs to be used as -dryrun:1

PS: I have not tested this script rigorously, no unittests, it's not "supported", etc. etc. Use at own risk.

@AbdealiJK: Thanks a lot! Had to patch it a bit as you mentioned, see https://gist.github.com/drtrigon/a1945629d1e7d7f566045629a43c0b06

First I got this error, which I solved by adding some unicode() (see patch also):

$ python bulk.py -search:'eth-bib' -limit:5 -logname:test -dryrun:1
Retrieving 5 pages from commons:commons.
1 . Analyzing File:Alfred Werner ETH-Bib Portr 09965.jpg
ERROR:root:ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
Traceback (most recent call last):
  File "bulk.py", line 251, in run
    log += self._test_file(page, path)
  File "bulk.py", line 137, in _test_file
    col_info['Color:EdgeRatio'])
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
2 . Analyzing File:Augusto Gansser ETH-Bib Dia 022-005.jpg
ERROR:root:ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
Traceback (most recent call last):
  File "bulk.py", line 251, in run
    log += self._test_file(page, path)
  File "bulk.py", line 137, in _test_file
    col_info['Color:EdgeRatio'])
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
3 . Analyzing File:Aurel Stodola ETH-Bib Portr 06960.jpg
ERROR:root:ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
Traceback (most recent call last):
  File "bulk.py", line 251, in run
    log += self._test_file(page, path)
  File "bulk.py", line 137, in _test_file
    col_info['Color:EdgeRatio'])
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
4 . Analyzing File:Aurel Stodola ETH-Bib Portr 09556.jpg
ERROR:root:ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
Traceback (most recent call last):
  File "bulk.py", line 251, in run
    log += self._test_file(page, path)
  File "bulk.py", line 137, in _test_file
    col_info['Color:EdgeRatio'])
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
5 . Analyzing File:ERMETH ETH-Bib Ans 00290.jpg
ERROR:root:ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
Traceback (most recent call last):
  File "bulk.py", line 251, in run
    log += self._test_file(page, path)
  File "bulk.py", line 137, in _test_file
    col_info['Color:EdgeRatio'])
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')

VERBOSE:pywiki:Dropped throttle(s).
VERBOSE:pywiki:Waiting for 1 network thread(s) to finish. Press ctrl-c to abort
VERBOSE:pywiki:All threads finished.

and then I succeeded ;)

$ python bulk.py -search:'eth-bib' -limit:5 -logname:test -dryrun:1
Retrieving 5 pages from commons:commons.
1 . Analyzing File:Alfred Werner ETH-Bib Portr 09965.jpg
WARNING: /home/osboxes/core/file_metadata/utilities.py:88: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  for data in iter(lambda: _file.read(block_size), ''):

WARNING:py.warnings:/home/osboxes/core/file_metadata/utilities.py:88: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  for data in iter(lambda: _file.read(block_size), ''):

WARNING:root:HAAR Cascade analysis requires the optional dependency OpenCV 2.x to be installed.
2 . Analyzing File:Augusto Gansser ETH-Bib Dia 022-005.jpg
WARNING:root:HAAR Cascade analysis requires the optional dependency OpenCV 2.x to be installed.
3 . Analyzing File:Aurel Stodola ETH-Bib Portr 06960.jpg
WARNING:root:HAAR Cascade analysis requires the optional dependency OpenCV 2.x to be installed.
4 . Analyzing File:Aurel Stodola ETH-Bib Portr 09556.jpg
WARNING:root:HAAR Cascade analysis requires the optional dependency OpenCV 2.x to be installed.
5 . Analyzing File:ERMETH ETH-Bib Ans 00290.jpg
WARNING:root:HAAR Cascade analysis requires the optional dependency OpenCV 2.x to be installed.


==== [[:File:Alfred Werner ETH-Bib Portr 09965.jpg]] ====
{| class="wikitable"
|
* '''Mime Type''': image/jpeg
* '''Average RGB value''': 69.131, 68.688, 65.987
* '''Closest Pantone color''': PMS 19-0405 TPX (Beluga)
* '''Edge Ratio''': 0.32789208962
* '''Number of grey shades used''': 47
* '''Percent of the palette occuring frequently''': 0.141176470588
* Face #1
** Using: dlib/flandmark
** Score: 1.02
** Bounding Box: Left:187, Top:401, Width:643, Height:643
* '''Time taken''': 36.735373 sec
|
<div style="position:relative;">
[[File:Alfred Werner ETH-Bib Portr 09965.jpg|136x200px]]
<div class="position-marker file-meta-face" style="position:absolute; left:18px; top:39px; width:62px; height:62px; border:2px solid #00ff00;"></div>
</div>
|}


==== [[:File:Augusto Gansser ETH-Bib Dia 022-005.jpg]] ====
{| class="wikitable"
|
* '''Mime Type''': image/jpeg
* '''Average RGB value''': 153.041, 149.583, 142.662
* '''Closest Pantone color''': PMS 16-4400 TPX (Mourning Dove)
* '''Edge Ratio''': 0.233318514327
* '''Number of grey shades used''': 210
* '''Percent of the palette occuring frequently''': 0.247058823529
* Face #1
** Using: dlib/flandmark
** Score: 0.57
** Bounding Box: Left:246, Top:112, Width:87, Height:87
* '''Time taken''': 2.109742 sec
|
<div style="position:relative;">
[[File:Augusto Gansser ETH-Bib Dia 022-005.jpg|200x166px]]
<div class="position-marker file-meta-face" style="position:absolute; left:82px; top:37px; width:29px; height:29px; border:2px solid #00ff00;"></div>
</div>
|}


==== [[:File:Aurel Stodola ETH-Bib Portr 06960.jpg]] ====
{| class="wikitable"
|
* '''Mime Type''': image/jpeg
* '''Average RGB value''': 175.903, 169.415, 143.133
* '''Closest Pantone color''': PMS 15-0513 TPX (Eucalyptus)
* '''Edge Ratio''': 0.276959453902
* '''Number of grey shades used''': 127
* '''Percent of the palette occuring frequently''': 0.354248366013
* Face #1
** Using: dlib/flandmark
** Score: 1.62
** Bounding Box: Left:134, Top:134, Width:216, Height:216
* '''Time taken''': 2.300966 sec
|
<div style="position:relative;">
[[File:Aurel Stodola ETH-Bib Portr 06960.jpg|145x200px]]
<div class="position-marker file-meta-face" style="position:absolute; left:44px; top:44px; width:72px; height:72px; border:2px solid #00ff00;"></div>
</div>
|}


==== [[:File:Aurel Stodola ETH-Bib Portr 09556.jpg]] ====
{| class="wikitable"
|
* '''Mime Type''': image/jpeg
* '''Average RGB value''': 106.645, 101.958, 94.981
* '''Closest Pantone color''': PMS 19-0808 TPX (Morel)
* '''Edge Ratio''': 0.271034443131
* '''Number of grey shades used''': 80
* '''Percent of the palette occuring frequently''': 0.0941176470588
* Face #1
** Using: dlib/flandmark
** Score: 0.31
** Bounding Box: Left:104, Top:161, Width:259, Height:259
* '''Time taken''': 2.072901 sec
|
<div style="position:relative;">
[[File:Aurel Stodola ETH-Bib Portr 09556.jpg|146x200px]]
<div class="position-marker file-meta-face" style="position:absolute; left:34px; top:53px; width:86px; height:86px; border:2px solid #00ff00;"></div>
</div>
|}


==== [[:File:ERMETH ETH-Bib Ans 00290.jpg]] ====
{| class="wikitable"
|
* '''Mime Type''': image/jpeg
* '''Average RGB value''': 122.335, 121.702, 118.865
* '''Closest Pantone color''': PMS 18-5102 TPX (Brushed Nickel)
* '''Edge Ratio''': 0.224917441468
* '''Number of grey shades used''': 255
* '''Percent of the palette occuring frequently''': 0.990849673203
* '''Time taken''': 1.438128 sec
|
<div style="position:relative;">
[[File:ERMETH ETH-Bib Ans 00290.jpg|144x200px]]
</div>
|}
VERBOSE:pywiki:Dropped throttle(s).
VERBOSE:pywiki:Waiting for 1 network thread(s) to finish. Press ctrl-c to abort
VERBOSE:pywiki:All threads finished.

as you can see there are basically 2 (or 3?) different warnings:

WARNING: /home/osboxes/core/file_metadata/utilities.py:88: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  for data in iter(lambda: _file.read(block_size), ''):

WARNING:py.warnings:/home/osboxes/core/file_metadata/utilities.py:88: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  for data in iter(lambda: _file.read(block_size), ''):

WARNING:root:HAAR Cascade analysis requires the optional dependency OpenCV 2.x to be installed.

How to install this "optional dependency OpenCV 2.x"?

@AbdealiJK: The utilities.py:88 seem not to be reproducible. Deleted bulk.py as well as files/ and then downladed and patched bulk.py again. Afterwards these warnings did not appear anymore...(?)

The simplest way to install openCV would be sudo apt-get install python-opencv. I'm still working on making dependency handling slightly better, along with https://github.com/AbdealiJK/file-metadata/issues/46

The utilities.py:88 issue happens when the zxing and dlib data is downloading. So, after the first time it gets downloaded, it won't occur anymore. I have fixed it in my local repo (I was revamping related code, and I happened to notice this too)

Oh, ok - I always mix up cv and cv2... ;) Installing python-opencv did the trick, thanks!

Where does it download the data to?

Currently ~/.local/share/file-metadata or something like that. It's the "default location" where the OS lets apps put user data into. (It's mentioned by the package called appdirs)

@DrTrigon: I am resetting the assignee of this task because there has not been progress lately (please correct me if I am wrong!). Resetting the assignee avoids the impression that somebody is already working on this task. It also allows others to potentially work towards fixing this task. Please claim this task again when you plan to work on it (via Add Action...Assign / Claim in the dropdown menu) - it would be welcome. Thanks for your understanding!