Follow up to our idea of computing number of distinct user agents for pages that have a high number of requests some of which might be spamy (not self reported bots) .
We need to figure out the false positive ratio of applying techniques such as this one.