The trending services used on Pushipedia and trending suffer from the reader being notified of the occasional edit war due to vandalism. We'd like to use ORES to filter out good and bad trends. This would allow us to in future provide separate streams for editors interested in being notified of vandalism in real time and readers being notified of hot topics.
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Declined | None | T145829 Trending API should consult ORES | |||
| Resolved | Tgr | T153321 [Spike] Review ORES architecture for Reading Product plans |
Event Timeline
@mobrovac @Pchelolo this is something we are going to begin looking into while evaluating the ORES service. (@Tgr is going to be doing some scoped investigations into a few use cases like T157132 in regards to scaling issues).
I don't think this is a high traffic use case for ORES, but just want to put that on your radar.
@Fjalapeno If we decide to go with T157132 this task will get resolved automatically since we use content hydration to construct the trending response - whatever you add to summary automagically get added to trending and to the feed. I think we can just close this one and merge the discussion into the one about summaries.
I think that would handle getting he data - I think Jon will want this ticket for the work that specifically takes the Ores value from the summary and uses it as part of the algorithm in the trending service
I think Jon will want this ticket for the work that specifically takes the Ores value from the summary and uses it as part of the algorithm in the trending service
Hm, that's a little bit different story.. We again would have the same issue as I've described in T157132#2997281 lots of stuff is following the edits and if all of that start making requests to ORES they might hit it before ORES caches were updated duplication/triplicating the load. If in the summary case we might invent some kind of a flow control (like serenader summary only after the ORES data is ready) here we don't have any control at all. Maybe adding a 10s delay before actually processing the edit in the trending service might be a good idea.
My hope would be that an ores score is in the kafka event in some form. What I'm interested in is detecting vandalism.
If that can't done, I've also been imagining how we might use it only upon sending the trending API response on the top edited pages.
My hope would be that an ores score is in the kafka event in some form. What I'm interested in is detecting vandalism.
That actually would be a very viable use-case for the Edit-Review-Improvements-ReviewStream project that's being discussed right now.
If that can't done, I've also been imagining how we might use it only upon sending the trending API response on the top edited pages.
That can be done, we just need a new, enriched, topic for revision-creates.
+1 to that. Let's fist conclude the work on ReviewStream and then see how we can use this in the trending edits service.
Would a dedicated stream of revision-scores (ORES scores) work for this? We are talking about adding something like this in T143743#2966929. That would look somethinn like {'rev_id': 1234, 'ores': { ... } }. Something would have to associate the revision-score event with the revision-create event, since the scoring happens later than the create.
@Ottomata reading that ticket it seems the basic thrust is that you want to provide operate streams for different types of data? Meaning that since ORES scores will delay processing of edit events, those should be delivered separately?
@Jdlrobson would it be ok to receive the events separately (or maybe just wait for the ORES event instead?)
One interesting question that Aaron raised is that while wp10 scores are probably not interesting for a trending pages service, wp10 deltas (ie. how much did the page quality increase with the last edit or in the last day) might very much be. That could just be handled by the trending service making a bunch of extra API calls, but it's maybe generic enough to be worth injecting somewhere upstream in the data.
+1. I can see the delta being relevant to other parties as well (ReviewStream, e.g.). We should consider this use case as well when we construct ReviewStream, so that it is readily available to all the interested parties without extra overhead.
Additionally @dr0ptp4kt had an idea about including 1st and 2nd derivatives as well. So we can see gauge quickly these scores are changing.
I can file a ticket with these requests as well.
@Fjalapeno @dr0ptp4kt it would be helpful if you could comment to that effect on T143743: Set up the foundation for the ReviewStream feed as I think such requests should be made part of the ReviewStream requirements.
Sure, but is there an open task relating to exposing ORES in the EventsStream? maybe T157132?