A user of the dumps of mediawiki history points out that it's computationally expensive to find the number of unique users of each page. I wonder where we stand on making something like this available vs. providing a queryable form of this dataset sooner rather than later. This particular user would be happy if we added three fields: page_unique_registered_editors, page_unique_bot_editors, page_unique_anon_editors. Those could be either counts or actual lists. If we had incremental updates, it would be awesome to store these as unique sets and add to them as edits come in.
Quote from the original email thread, to give more context about how this is useful:
"Thanks for creating the feature request on number of editors. This would be fantastic. The number of editors who intervened in an article is the most correlated feature to the number of interwiki. Topics interesting to many editors in one language are more likely to be translated.
I'm also computing the number of edits on the talk page of the article. To do this, I follow the same procedure of taking the edit_count of the last revision for a specific page_title and page_namespace = 1. I don't think it is necessary to create another field.
I also compute the number of edits done in the last month (which is a similar metric than seconds since the last edit.... to see the amount of recent activity). We have the first_timestamp, I also keep the last_timestamp for every article, but you eventually get this by reading all the revisions and keeping the timestamp (and comparing, now that they do not seem in order).
The different reorderings (by page, by edit timestamp and by user) would all be fantastic. I would only do the reorder by user, because that would facilitate a lot the analysis on each editor lifecycle, and I would include the new features on editor_count (by type). So no need to reorder by page or no need for a lighter dataset including all the unique editors by page. But that's according to some uses I need and I see possible."