Word from @Theklan is that there is a good dictionary for Basque. Let's implement a set of features.
Also, there are some paragraphs of English and Spanish that we might want to catch.
English or spanish in a <ref> tag is OK, but not in the rest of the content.
Proposal:
- Merge https://github.com/wikimedia/revscoring/pull/400
- Edit https://github.com/wikimedia/articlequality/blob/master/articlequality/feature_lists/euwiki.py
- Add new features for the proportion of words that match Basque, English, and Spanish dictionaries.
- Train and test to compare results.