About
See parent ticket here: T140102
Endpoint path under discussion here: T150039
This is a request to deploy a Trending Edits Service. We hope to have the initial deployment by November 30th prior to the holidays. We realize the schedule is tight, so any help with this is greatly appreciated.
This service provides a list of pages that are currently trending through an algorithm that analyzes the rate of edits.
Service Architecture
Language
NodeJS
Diagram
TBD
Connection
To process edits in real time, it uses WebSockets (socket.io) to connect to the RC Stream.
Event Collection
Each time an event is received through from the RC Stream, the service runs several checks to exclude changes that aren't relevant to the algorithm (like bot edits, moves, deletions).
From here, each relevant event is counted as an "edit" (Reverts are tracked separately). If the page related to the edit is not currently being tracked, a page object is created and then stored in a hash by the page's id.
In Memory Cache
The page id hash is pruned every 20 seconds to keep the footprint as small as possible. Pages are kept at least 5 minutes, but then must be edited 3 times a minute to stay in the hash. Any page older than 1 day is automatically purged to keep the data fresh.
Persistance
The current prototype uses levelDB for persistence of changes, but this will be removed before production. For initial deployment the service will not persist any data and instead use only memory. Persistence may eventually be re-added if needed, but may be obviated if the service moves to using Kafka where it can replay events as needed.
API
When the trending API is called, it first checks for a cached response and then recalculates if necessary. It calculates a score based on several factors such as edits, reverts, anonymous edits, number of contributors, and views. The algorithm itself is still being tweaked to find a good balance of surfacing trending articles.
Projects
Currently the project only processes events for the English Wikipedia. The current algorithm only sees a enough events on high traffic pages. After the initial testing phase for EN, we will be exploring options to expand to other wikis.
Service Info
| Owner | Reading Mobile-Content-Service |
| Contact person | @bearND / @Jdlrobson / @Fjalapeno |
| Timeline | November 30, 2016 |
| Source code | trending-edits |
| Prototype code | weekipedia, wikitrender |
| Labs Prototype | trending-edits-api, trending-edits-ui |
| Target cluster | SCB |