file-metadata should be enhanced by adding language recognition. E.g. use poppler or anything else to extract text from PDF, SVG, etc. and then use one of the packges mentioned below for language recognition.
See also:
- http://blog.alejandronolla.com/2013/05/15/detecting-text-language-with-python-and-nltk/
- https://pypi.python.org/pypi/langdetect? (google)
- https://github.com/saffsd/langid.py
- https://github.com/kent37/guess-language
- Text Classification by Aggregation of SVD Eigenvectors: http://delab.csd.auth.gr/papers/ADBIS2012skm.pdf
Possible Categories to put for the bot:
Details:
Primary mentor:
Co-mentor:
Other mentors: (optional, Phabricator username)
Skills: (python and computer vision)
Estimated project time for a senior contributor: (2-4 weeks)