The "Page History" tool provides a Prose size character and word count, however it doesn't correctly ignore <style> (via TemplateStyles) and <math> tags. I recently created https://prosesize.toolforge.org/ - it has an API, I would suggest pulling counts from there instead; I also wrote a blog post with some more detail.
Description
Related Objects
Event Timeline
Awesome! I might just go off of your code, though, as the XTools Prose API is used quite frequently -- some 50,000+ requests a day, and that's not including HTML requests to the "Page history" tool. XTools already has to scrape the HTML anyway for various other stats.
I have gone by your blog post to improve XTools' algorithm. It still doesn't always match, though. Sometimes XTools overcounts, or yours does and it's unclear which is correct. de:Provinzial-Heil- und Pflegeanstalt Allenberg for example is off by only two words from the new XTools version. I've tried to analyze line by line and can't see where the differences are. I did notice however both our implementations may be overcounting words after removing other elements. I lost the example, but sometimes math elements are comma-separated, and we're both just splitting on a space character to count words. I'm not sure how to reliably remove punctuation that shouldn't be counted, but it's a trivial difference anyway.
The new implementation is on GitHub should you wish to review it.
Thanks again for filing this bug!