Create a significant changes endpoint that the "Article as a living document" experiment will use to populate this data. We will keep this going in a labs environment. This is what the endpoint will need to do to gather the data we need. Specific contract TBD.
- Flag large and small changes based on a particular byte or character threshold. Would be ideal if we can flag based on number of characters added and deleted, not just overall article size deltas between revisions. [DONE]
- Detect if a reference added, generate readable snippet (not wikitext) from that. [DONE, returning structured object for client to turn into readable snippet]
- Detect large changes, generate readable snippet (not wikitext) from that. [DONE, though occasionally the highlighting tags change the parsoid output. If this feels like a problem we can completely remove highlighting indicators & truncating.]
- Detect what section this happened in in the above 3 pieces. [DONE]
- Flag small changes and bundle them up with a count. [DONE]
- Detect if a new section was added in an article's talk page, generate readable snippet (not wikitext) from that [DONE]
- Detect if vandalism was reverted in an article's talk page, generate readable snippet (not wikitext) from that. [DONE]
- Have an in-memory caching setup for each revision that is processed of an article and article's talk page revision history. Check this cache first before going forward with processing. [DONE, caching up to 100 significant article events. Events will stop returning beyond that].
- Also return global counts used to make "x changes by x editors in x days" snippet based solely on cached objects. In theory we should have a decent amount cached at all times. [DONE]
- Once the correct threshold is decided upon, remove that configurable component (as well as page size). Changing thresholds causes the caching objects to double, and there would be no need for this functionality when we go live.
- Consider adding editor counts as a last step - though keep in mind this shouldn't be cached since it may change with each call. [DONE]
- We should limit how far back in history we can go so users aren't building up cache that will never be seen client side. [DONE]
- If snippet or counts endpoints fail, do not let the endpoint error out. [DONE]
- Note for article readable snippets, they need to be set up in a way so that we can easily seek it out in the article content to highlight, but also maybe not include the entire line because we need these snippets to be short and easy to read (with ... before and after added text perhaps). This might be a balancing act, or we return one snippet meant for the modal screen, and a separate snippet meant for highlighting on the content screen.
Questions for Product/Design:
- How far back do we allow users to go? For both the significant changes paging as well as the "x changes by n editors in z days" question at the top? - Per product 100 events
- Requirements above were based on types gleaned from the mocks, but it would be good if we could have the types documented/hashed out. - new mocks for every type in parent task