Page MenuHomePhabricator

Investigate FeaturedFeeds
Closed, ResolvedPublic

Description

What is the One True Source for discovering all of the featured content on a wiki? Is there one? Can we find out by looking at the FeaturedFeeds source? Can we make one if one doesn't exist? Let's investigate.

Event Timeline

bearND added subscribers: MaxSem, bearND.

Info about the FeaturedFeeds extension: https://www.mediawiki.org/wiki/Extension:FeaturedFeeds
Source: https://phabricator.wikimedia.org/diffusion/EFFD/repository/master/

Some more info about what regex AnomieBOT looks for when creating the TFA can be found in
https://phabricator.wikimedia.org/T148636#2729147. I'm not sure if AnomieBOT is used by the FeaturedFeeds extension since it seems to AnomieBOT does this for enwiki, whereas the FeaturedFeeds extension seems to work for a big list of wikis.

@MaxSem would you tell us where this extension gets in data from?

It gets connected to templates that display featured content on the main page. So while yeah, it's more reliable than regexps it can still be broken by changes to templates.

@MaxSem Interesting. How does it get connected to templates? Is this something we could do with a RESTBase service, like MCS?

Via several messages that wikis have to edit. There's nothing preventing you from doing the same.

Here are some notes and links from talking to @MaxSem about FeaturedFeed configurations: P4767
Still would like to understand this better.

Can we use the MediaWiki:Ffeed-featured-page?

I would be happy if we could use this to find the featured article title or page using this. It seems that results vary quite a bit between wikis, though. Best results first:

Limited to one week

Based on weekdays when it is executed. Cannot specify arbitrary date.

Limited to current day

Limited to current day but shows old result

I don't understand why it shows it for an older day.

No result

Incorrect results/Misconfigured?

For some wikis this page just links to the translated text of "Wikipedia, the free encyclopedia"

Incorrect results/Misconfigured?

For some wikis this page just links to the translated text of "Wikipedia, the free encyclopedia"

That's site subheading ("From Wikipedia, the free encyclopedia"), not result of page content parsing. Content itself evaluates to empty string.

The featuredfeedaction
https://en.wikipedia.org/wiki/Special:ApiSandbox#action=featuredfeed&format=json&feedformat=rss&feed=featured
produces an RSS feed containing items with <link> elements like this: https://en.wikipedia.org/wiki/Special:FeedItem/featured/20170124000000/en

Works

-> https://no.wikipedia.org/wiki/Spesial:FeedItem/featured/20170116000000/nb (Note that the language is 'nb' but the wiki is 'no'.)

Does not work/The featured feed does not exists

Even with these we would need to parse the HTML to find the title. We could use the first bolded link if there is no special markup to be found.
The nice thing is that the thumbnail for the featured article is included on those pages.

That's site subheading ("From Wikipedia, the free encyclopedia"), not result of page content parsing. Content itself evaluates to empty string.

Makes sense. So, they are basically in the same category as 'No result'.

@MaxSem Any ideas why for some of the WP projects FeaturedFeed doesn't return anything? (See above for examples: es, it, nl, sv, pt, ru)

Because it's not set up there.

And you can ask global interface editor rights temporarily to edit the local configuration and make FeaturedFeeds work in more wikis. I see most of the people here have only about 300 total edits on the wikis, but I'm sure MaxSem can help you with editing. :)

Change 340800 had a related patch set uploaded (by BearND):
[mediawiki/services/mobileapps] Spike: Which language projects support FeaturedFeed extension

https://gerrit.wikimedia.org/r/340800

Here's what we can get for wikis that follow the conventions set forth by the FeaturedFeeds extension:

The main blockers for getting TFA for arbitrary dates are really big differences between Wikipedia projects how the handle TFA. Different PHP time formats are used in the templates to specify the date. dewiki only addresses a week's worth of TFA articles through the FeaturedFeed extension convention since they are using day of the week.
Languages we can cover through this are 8 languages in total ["de","en","fa","fr","he","hu","ja","zh"]. 7 more than what we currently support. Theoretically Greek and Urdu appear to work with the API call I make in the patch but the former seems to point to older TFAs while the latter points to a page containing multiple TFAs. We will have to exclude 'zh' for until the language variant issue is resolved.

An alternative approach (esp. for languages not configured for FeaturedFeed extenstion) is to use the main page and look for an entry which has the class or microformat attributes we would define. Ideally we'd have a combination of both FeaturedFeed convention and a future convention to help us find the title for TFA.

  • wotd: (Word of the Day) ["en"] (Had to take Vietnameses off the list since the MediaWiki namespace page doesn't point to a valid page, even though the text looks reasonable.) I believe we could get German added here and potentially more languages. The German Wiktionary MediaWiki:Ffeed-wotd-page is interesting since it uses a parser function to only show the word of the week only on a specific day of the week. I'll need to change the script to account for this.
  • fwotd: (Foreign Word of the Day) ["en"]
  • qotd: (Wikipedia Quote of the Day): ["it"]
  • potd: (Wikipedia project specific Picture of the Day): ["en","fa","fr","he","hu","pl","vi","zh"] (- Urdu: too many Scribunto errors)
  • motd: (Wikipedia project specific Media of the Day): []
  • dyk: (Wikipedia Did you know?): ["fa", "he", "pl", "zh"]

Not included in FeaturedFeed yet but I think that would be useful in the future:

  • itn: (In The News): It would be great to get a link to In the News. We would still need a convention of marking up news stories and links so we can parse those pages without having language specific code.

Not useful IMO:

  • Good articles: I haven't seen any Wikipedia projects use a page like MediaWiki:Ffeed-good-page.
  • onthisday: Current day only. We are better off using Wikidata item links so we can address arbitrary day of the year.

The main blockers for getting TFA for arbitrary dates are really big differences between Wikipedia projects how the handle TFA

This sounds more like a benefit. The community will take care of making sure that the feed makes sense locally.

Different PHP time formats are used in the templates to specify the date.

I thought the RSS/Atom feeds just were standard feeds. The subpage format only matters for the configuration of FeaturedFeeds, then you don't have to worry about it.

dewiki only addresses a week's worth of TFA articles

Again this is a feature. If they think it makes sense so, there is probably a reason.

Languages we can cover through this are 8 languages in total

Really? Hundreds of Wikimedia wikis use similar systems in their main pages. It only takes a bit of work to configure FeaturedFeeds in all/most of them (and live with the fact that each wiki has a different view on what is interesting for users).

I did another run. This time I just printed the wikitext instead of expanding the templates. Overview:

  • tfa: 17: ["bg","cs","de","el","en","fa","fr","he","hu","ja","la","no","ta","tg","ur","vi","zh"]
  • wotd: 3: ["de","en","vi"]
  • fwotd: 1: ["en"]
  • qotd: 1: ["it"]

The details are pasted to P5015. As I hinted to in my previous comment some of the projects didn't print a value because they explicitly try to suppress duplicate entries using a parser function {{#ifeq}} or {{#ifexist}}.
17 languages for TFA is much better than 8. Still would love to get even more Wikipedia projects on board.

I think we could now actually use the URI format for the FeaturedFeed items directly. I would love to get a real API for this so we do have to download headers and footers unnecessarily. @Tgr showed me a workaround we could use until then: add ?useskin=apioutput to the end. So, for enwiki tfa we could use the following URI: https://en.wikipedia.org/wiki/Special:FeedItem/featured/20170306000000/en?useskin=apioutput.

I also checked for repeats in the output and scanned the wikitext used in the setup pages. Looks like dewiki is the only one of the 17 TFA candidates which uses days of the week and therefore should be limited to the last seven days. To limit an undesirable behavior in the FeaturedFeed output for dewiki I filed T159664.

Another exception is that for nowiki we should use a different language ('nb') than in the domain name.

Had to remove two more languages from the list: ta and tg don't have any featured articles showing up in the feed.

Now we're at 14 for tfa: ["bg","cs","de","el","en","fa","fr","he","hu","ja","la","no","ur","vi"]

IMO it's better to approach this from the other direction:

  • figure out what the MCS requirements are (just having the wiki support FeaturedFeeds, or something beyond that?)
  • make sure the method to add support is well-documented and easy to understand
  • reach out via Tech News / tech ambassador mailing list, explain the benefits of FF and ask communities to add support to as many wikis as possible

Change 340800 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Spike: Which language projects support FeaturedFeed extension

https://gerrit.wikimedia.org/r/340800

reach out via Tech News / tech ambassador mailing list, explain the benefits of FF and ask communities to add support to as many wikis as possible

This was already done a few years ago, and of course it bears repeating, but especially on smaller wikis it's often easier to show by example (just do it, notify you did it, let users correct any mistake).