Page MenuHomePhabricator

[SPIKE] How could we add topic filtering to Recent Changes? [8H]
Closed, ResolvedPublic

Description

Per the parent task, we would like to explore adding topic filtering in Recent Changes. As discussed in that task, the technical approach has a number of options. The most recent is described in T245906#6051840.

We might ideally prefer not to depend on the ORES extension, since it isn't deployed everywhere. Alternatively, we could use it and pursue deploying it to all Wikimedia projects.

T245906#6051840 contains the most recent proposal for how to tackle this problem. See also T380825: Make ORES topics and their translations easily available to MediaWiki extensions.

We should make a decision, or lay out the options, for how we could store and retrieve data on the topic of the page a recent changes edit is made to.

Per T381571: [DESIGN] How could we enable editors to filter by topic in Recent Changes?, this can be added to RCFilters as another 'advanced' filter, alongside Namespace and Tags. It does not need a new UI as compared to those, we'll try to work on this before the Codex update, so we'll worry about the best way to have this UI implemented as part of that work.

Event Timeline

Scardenasmolinar renamed this task from [SPIKE] How could we add topic filtering to Recent Changes? to [SPIKE] How could we add topic filtering to Recent Changes? [16H].Dec 10 2024, 4:15 PM
Scardenasmolinar moved this task from To be estimated to Estimated on the Moderator-Tools-Team board.

it looks like the proposed taxonomy will be

Scardenasmolinar renamed this task from [SPIKE] How could we add topic filtering to Recent Changes? [16H] to [SPIKE] How could we add topic filtering to Recent Changes? [8H].Jan 7 2025, 4:30 PM
Scardenasmolinar triaged this task as High priority.
Should we use ORES (extension) or not?

Though it's technically feasible to implement without ORES using a deferred update that calls the Liftwing API to store the topics on the recentchanges table, this seems counter to how we're currently using machine learning filters on Recent Changes (through the ORES extension).

ORES is the extension which integrates data from an ORES(now Liftwing) services into the RecentChanges view.

Per documentation:

It is installed on several Wikimedia sites, but no longer deployed to new ones.

For wikis that do not want machine learning UI filters in Recent Changes this extension would not be installed. For wikis that do want the machine learning UI filters in Recent Changes it is installed already, and if not we have already identified a process for installing it in T382171: Install ORES extension on idwiki.

How to store this data on rc table? Do we want to?

In my opinion we don't want to store this on the recent changes table because the data modeling would be awkward (we'd have to store the topics in a JSON blob since many topics can belong to one article and therefore one recent change as well).
Since ORES extension is the current implementation for adding machine learning filters to Recent Changes I suggest keeping it there until we can replace it or refactor it.

The ORES extension does not currently have any integration with the article topic model API, however we could very easily add to the LiftwingService class (when the API is available; it looks like the current docs are still under construction).
We could then store the article topic classification in the oresc_class column or another column like oresc_class_text. (Scaling considerations make me lean toward oresc_class_text as oresc_class is a tinyint as mentioned here ).
We then don't even need to retrieve all topics from the WikimediaMessages extension as we'd have the topic in the ores_class_text column.
We could then build upon the query builder's join conditions like we are for the other filters.
This method uses the ChangesListSpecialPageStructuredFilters hook.

Moving to engineering review to get some feedback from other engineers.

The findings look good to me; thanks for your research!

jsn.sherman added a subscriber: Scardenasmolinar.

@Scardenasmolinar looked at this today and approved the findings; we'll need to create a task to implement this feature in ORES. I'll leave it open until that is done.

@Scardenasmolinar looked at this today and approved the findings; we'll need to create a task to implement this feature in ORES. I'll leave it open until that is done.

Correction (the above was a quick note from an engineering meeting): we can use T245906: Expose ORES topics in recent changes filters, which is in the product backlog. I'm going to stall it out since this is waiting on other work.