The script extracts articles in articles needing copy edit. It then computes the Flesch-Kincaid readability scores as well as gets the page view counts of them. It then standardizes each of these two adds them to get the final contribution. Final rankings are based on high page view counts and poor readability scores. Top 20% are retrieved and made into questions.
Commit is here.
Related python scripts are:
copy_edit.py - main script that does the computations and generates questions
syllables_en.py - helper script to get syllables in a piece of text
utils.py - helper script to get words in a sentence, syllable count, sentence count, etc
copy_edit_ranking.pkl - generated pickle file of final article rankings. This is stored in the form of an ordered dict like:
dict[title] = [link, pageview, fk score, added standardized score]