Page MenuHomePhabricator

Add a link engineering: exclude good and featured articles
Closed, DeclinedPublic

Description

There are certain types of articles that we do not want to suggest to users for adding links, because those articles are not productive places for links by newcomers. Two of those types are "good" and "featured" articles. Because those articles are crafted carefully by experienced Wikipedians, links that are not in them are likely not there for a deliberate reason that newcomers should not override.

This is similar to work that Platform Engineering is doing for image suggestions in:

Event Timeline

I know this is likely more challenging than T279128: Add a link engineering: exclude disambiguation pages because these "good" and "featured" designations come from different templates on each wiki. This is not required for our initial release, but if it is not actually that challenging, and makes sense to do on initial task generation, then we should discuss.

Per T266443: Add Link engineering: On-wiki configuration, you can already exclude specific categories or templates (although only globally for a given wiki, not for a specific task type).

Good/featured flags can be accessed internally as Wikidata badges, so it can't be too complicated to do it directly either, although I never worked with that part of the codebase.

Per T266443: Add Link engineering: On-wiki configuration, you can already exclude specific categories or templates (although only globally for a given wiki, not for a specific task type).

@MMiller_WMF is that good enough for now? If you want to provide us a list of templates / categories that should be excluded (important side note that @Tgr pointed out, the exclusion list applies to all task types and not just the new link recommendation one) for the target wikis, we can add that configuration to MediaWiki:NewcomerTasks.json before we start filling up the task pools.

Good/featured flags can be accessed internally as Wikidata badges, so it can't be too complicated to do it directly either, although I never worked with that part of the codebase.

Not sure if this is helpful or you might already know all of this.
You can get the relevant articles via the wikidata query service.

Not sure how easy it is to make/access those queries from the production environment. Otherwise, one could easily build a table with the list of articles containing the relevant badges from the wikidata-dumps in hive (this and this) when running the pipeline for training the model. This could be similar to the tables we have where we do lookups of page-information.

There are also a number of other quality-badges that might be relevant: https://w.wiki/3AL8 such as A-class, etc

@kostajh -- I think the capability to exclude categories and templates will suffice for now, but that we shouldn't actually use it yet. For our initial release, let's not do anything with it, but then if communities have questions or concerns about Good or Featured (or any other sort of article), we could decide to change the configuration and repopulate their tasks. Therefore, I am moving this task to the "improvements" epic.

In the longer term, we would want this to be one of the community configurations available via @Urbanecm_WMF's work in T274520: Move Growth configuration to on-wiki JSON file.

Actually, I guess we're not going to need to do any work on this, since we have the capability. I'll resolve it.

In the longer term, we would want this to be one of the community configurations available via @Urbanecm_WMF's work in T274520: Move Growth configuration to on-wiki JSON file.

Excluding categories or templates is part of the task type configuration, which has always been editable by the community (although only as raw JSON for now).