Page MenuHomePhabricator

Sparse data: run API for all allowed articles and counting number of events
Closed, ResolvedPublic

Description

The goal of this task is to get a high level understanding of how the Sig Events API is doing in terms of events detected and distribution of event types.

The deliverable is a table (here or in a google doc) with the set of allowed/test articles for this feature as rows, and then the count of Total significant events, as well as counts for each event type as the columns.

Feel free to do multiple runs with different dates or significance threasholds, but the minimum acceptance is just 1 report using the API as currently configured.

Event Timeline

Here's the data I've collected so far. All counts were taken on 10/23/2020. I'll work on getting this in a better format and fixing the odd ones out at the bottom next week.

smallEventsCount - count of revisions deemed small
newTalkPageTopicCount - count of new topics
vandalismRevertCount - count of vandalism revert revisions

otherLargeChangesCount - count of all other revisions considered large (revisions that contain added text and/or deleted text > 100 characters, and/or one or more new references
addedTextCount - count of revisions that were considered large and had added text
deletedTextCount - count of revisions that were considered large and had deleted text
newReferenceCount - count of new references in revisions that were considered large

significantEventCount - count of all revisions considered "significant"... basically how many expanded, non-small views that will show up in the feed. This is simply newTalkPageTopicCount + vandalismRevertCount + otherLargeChangesCount.

Goes to disambiguation page, resulting in small counts:
Media
XXXX
Raised_by_Wolves

@JMinor Counts are now in a Google doc:

https://docs.google.com/spreadsheets/d/1v_SFKvLPO_rebUixySRiBSqi2l09lMOwy1YIKf2UuGs/edit#gid=0

I just did it up to the max number of events to return that we have set (you'll notice significantEventCounts are all around 100, with the exception of the disambiguation pages mentioned in the comment above). As a reminder the threshold between something being considered a small event vs. a large event is 100 characters changed (either added or deleted), OR a new reference.

I'm happy to tweak these & gather alternative metrics if we want, the groundwork is done so tweaking and running again should go faster.