Page MenuHomePhabricator

Today’s Featured Article and In the News is not shown in Hungarian mobile app
Open, LowPublic

Description

The mobile apps’ so-called “Explore feed” contains various “cards” with varying content. Two of them are the main page’s Featured Article and In the News boxes. These work well when the Wikipedia language is English but not in Hungarian (tested on Android but I’m sure it’s the same on iOS). Making them work might need modifying the main page or any of its templates; I have right to edit all of them.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@Tacsipacsi You are correct, we will need templates / main pages updated across all languages. We are working on a markup spec to support this right now!

Any input / feedback you could provide to help us finalize the spec and get it implemented would be much appreciated!

I will post some of the initial ideas here later this week

bearND added a subscriber: bearND.Dec 14 2016, 6:08 PM

We have a task for adding Featured Article to other languages besides English I want to mention here: T150806 so we don't lose the link.

Thank you for offering making the necessary changes to the main page templates to help us better parse out the Featured Article title and news items.
I can try to add Hungarian 'In the news' sometime, though. Eventually we want to have some markers added to the 'In the news' templates as well as Featured articles and future feed functionality, so we don't have to make one-off implementations and just rely on special markup provided in the templates by template editors.

Thank you for taking care of it. I’m sure it should use some CSS class(es) (as classes are used for many other language-independent features, e.g. infobox so that the app shows “Quick facts” instead of “More information”) and I meant adding those to the main page boxes. I’m not good at naming things, so please do not ask me about its name. :)
I don’t know if our featured can be properly handled, i.e. show an article only once, as we have a complicated system with two articles a week: the first from the beginning of the week (Monday at 0:00) until Thursday noon, and another for the rest of the week. In the RSS feed currently the first article is appears four times and the second three times.

@Tacsipacsi The holidays slowed me down a bit, but here is a straw man for the markup being proposed. Please let me know if you have any thoughts or proposed changes.

https://www.mediawiki.org/wiki/User:CFloyd_(WMF)/Feed_Markup_Documentation

Should I comment here or on the talk page? The biggest problem I see is that certain links should have CSS classes—AFAIK, it's not possible in wikitext. Also, these descriptions are written by humans, not geeks, so the text itself shouldn't contain complicated codes. On Hungarian Wikipedia, there's a More... link at the bottom of the summary, that can have CSS class, e.g.

<div class="tfa-article" style="...">
    <a href="/wiki/Fő_tér_(Kolozsvár)" title="Fő tér (Kolozsvár)">Tovább a szócikkhez</a>
</div>

Images are also a piece that ideally should not be bothered with classes. I think the easiest is to just find the first (non-icon, i.e. larger than e.g. 100px) image in .tfa-summary. I haven't checked the news yet.

Fjalapeno added a comment.EditedJan 3 2017, 10:29 PM

Should I comment here or on the talk page?

@Tacsipacsi you can comment here, thanks for looking!

The biggest problem I see is that certain links should have CSS classes—AFAIK, it's not possible in wikitext.

Hmm… when we look at the HTML for these pages they seem to already include CSS classes (and ids as well).

The HTML below is from today's main page. Do you know how these classes have been added?

<div id="mp-tfa" style="padding:2px 5px">
<div id="mp-tfa-img" style="float: left; margin: 0.5em 0.9em 0.4em 0em;">
<div class="thumbinner mp-thumb" style="background: transparent; border: none; padding: 0; max-width: 150px;"><a href="/wiki/File:Smilodon_californicus_mount.jpg" class="image" title="S. fatalis skeleton at the National Museum of Natural History, Washington, D.C."><img alt="S. fatalis skeleton at the National Museum of Natural History, Washington, D.C." src="//upload.wikimedia.org/wikipedia/commons/thumb/7/78/Smilodon_californicus_mount.jpg/150px-Smilodon_californicus_mount.jpg" width="150" height="100" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/7/78/Smilodon_californicus_mount.jpg/225px-Smilodon_californicus_mount.jpg 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/7/78/Smilodon_californicus_mount.jpg/300px-Smilodon_californicus_mount.jpg 2x" data-file-width="1248" data-file-height="832" /></a></div>
</div>
<p><i><b><a href="/wiki/Smilodon" title="Smilodon">Smilodon</a></b></i>, the saber-toothed tiger, is an <a href="/wiki/Extinction" title="Extinction">extinct</a> <a href="/wiki/Genus" title="Genus">genus</a> of <a href="/wiki/Machairodontinae" title="Machairodontinae">machairodont</a> <a href="/wiki/Felid" class="mw-redirect" title="Felid">felid</a> that lived in the <a href="/wiki/Americas" title="Americas">Americas</a> between 2.5 million and 10,000 years ago, during the <a href="/wiki/Pleistocene" title="Pleistocene">Pleistocene</a> epoch. It was named in 1842 and identified by fossils from Brazil. The largest collection of its fossils has come from the <a href="/wiki/La_Brea_Tar_Pits" title="La Brea Tar Pits">La Brea Tar Pits</a> in <a href="/wiki/Los_Angeles" title="Los Angeles">Los Angeles</a>, California. The <a href="/wiki/Species" title="Species">species</a> <i>S.&#160;gracilis</i> and <i>S.&#160;fatalis</i> lived mostly in North America. A third species, the South American <i>S.&#160;populator</i> (meaning <i>destroyer</i>), was perhaps the largest known member of the <a href="/wiki/Felidae" title="Felidae">family of cats</a>, at 220 to 400&#160;kg (490 to 880&#160;lb) and 120&#160;cm (47&#160;in) in height. Overall, saber-toothed tigers were stronger than any modern cat, with well-developed forelimbs, big jaws, and long, slender upper <a href="/wiki/Canine_(tooth)" class="mw-redirect" title="Canine (tooth)">canines</a>, adapted for precision killing. In North America, they hunted large herbivores such as <a href="/wiki/Bison_antiquus" title="Bison antiquus">bison</a> and <a href="/wiki/Camelops" title="Camelops">camels</a>, pinning their prey before biting it. They probably lived in habitats that provided cover for ambushing prey, such as forests and <a href="/wiki/Shrubland" title="Shrubland">shrubland</a>. They died out at the same time that most North and South American <a href="/wiki/Pleistocene_megafauna" title="Pleistocene megafauna">megafauna</a> disappeared, during a period of <a href="/wiki/Climate_change" title="Climate change">climate change</a> about 10,000 years ago. (<a href="/wiki/Smilodon" title="Smilodon"><b>Full&#160;article...</b></a>)</p>

Also, these descriptions are written by humans, not geeks, so the text itself shouldn't contain complicated codes. On Hungarian Wikipedia, there's a More... link at the bottom of the summary, that can have CSS class, e.g.

<div class="tfa-article" style="...">
    <a href="/wiki/Fő_tér_(Kolozsvár)" title="Fő tér (Kolozsvár)">Tovább a szócikkhez</a>
</div>

Images are also a piece that ideally should not be bothered with classes. I think the easiest is to just find the first (non-icon, i.e. larger than e.g. 100px) image in .tfa-summary. I haven't checked the news yet.

This actually how we do things now, and for us to expand to more languages we have to move to something that is more explicit.

The reason is that is very difficult to reliably parse these pages with implicit rules like you are suggesting. If a page ever changes it's format or an editor just doesn't understand why the order is important then it will break.

This is why we support so few languages now, we can't keep up with all the subtle differences between the different projects, so we are trying to introduce a system that is explicit and allows the editors to mark the content.

We were hoping these types of changes could be added to the templates so editors won't have to do this manually - since not everyone is a geek as you said. So the best of both worlds: easy for us to write code that can extract the content and easy for editors to make changes.

Tacsipacsi added a comment.EditedJan 3 2017, 11:40 PM

The lack of classes is only stands for links (i.e. <a> tags, but not so easily cor images, where, if I remember correctly, <a> tags can get extra classes but <img>s can’t). The code snippet I wrote is the most explicit way that would work for text links. Images, as I said, can get extr classes (I think for their <a> tag). Is it OK if all images on the main page get the tfa-img or whatever class? This means currently two images within the .tfa-summatydiv, and many others outside it. The two in the div are relevant to the featured article (any of them can be on the card), but all images on the page use the same template.

@Tacsipacsi Ahh, I think I am following. So you suggest wrapping ALL the content for the TFA in a div, is that correct?

So going with your suggestion, if we had the div you suggested on every Main page:

<div class="tfa-article" style="...">
    <a href="/wiki/Fő_tér_(Kolozsvár)" title="Fő tér (Kolozsvár)">Tovább a szócikkhez</a>
</div>

As long as that is the only content within the div, that will work.

Note: we don't need the summary of the thumbnail, but we though t it would be nice to allow some editorial content here. It is totally ok if we exclude them, just keep in mind that we will use the article to grab a summary and a thumbnail. Does that sound reasonable?

Fjalapeno added a subscriber: Tgr.Jan 4 2017, 5:29 PM

Adding Gergo to review the spec as well

I just followed your idea:

<div class="tfa-summary">
    <a class="image tfa-thumbnail">
        <img />
    </a>
    <p>Summary lorem impsum dolor sit amet.</p>
    <p>It may be several paragraphs long.</p>
    <div class="tfa-article">
        <a>Full article</a>
    </div>
</div>

As long as that is the only content within the div, that will work.

You can define as the first <a> in .tfa-article, then it will work most of the time. Of course, there’s no absolutely foolproof solution, humans are very clever in finding new ways to break tools. :)

Note: we don't need the summary of the thumbnail…

Did you mean “summary or the thumbnail”? Otherwise it doesn’t make any sense for me.
It also doesn’t make sense for me to mark some content machine-readably but not use it. I think if there’s a marked summary and image on the main page, the app should use that, and the article should act as a fallback in case of missing main page content.

Tgr added a comment.Jan 4 2017, 8:11 PM

It's probably best to use "first link/image within <some class>" instead of being more specific, as link and image markup cannot be created by hand (links cannot get classes at all; image markup is a mess that changes wildly depending on what wikitext directives you use and is completely different in Parsoid). Arbitrary attributes also cannot be created, everything should be prefixed with data-.

(Another alternative is to expect data instead of real markup in the first place, so something like <div class="itn-story" data-itn-article="2016 elections in Foobaria" data-itn-image="File:Foobaria election results.png">...</div>. Or microdata, or whatever.)

I would also relax expectations about the structure, such as .itn-article being inside .itn-story (unless you actually need it to attach special behavior to links in the story). Mainpage content contributors shouldn't be exposed to HTML markup, so ideally it should be possible to generate the markup from a template like

{{news item|
|text=...
|image=...
|main article=...
|other article 1=...
}}
Tgr added a comment.Jan 4 2017, 8:17 PM

Two more small things:

  • I don't think it's realistic to expect this kind of handcrafting in articles of calendar days, where being easy for beginners to edit is probably a bigger concern than being machine-readable. Is there a use case for returning non-current anniversaries?
  • most wikis don't have a current events portal so you should probably fall back to the main page if it doesn't exist.

I just followed your idea:

<div class="tfa-summary">
    <a class="image tfa-thumbnail">
        <img />
    </a>
    <p>Summary lorem impsum dolor sit amet.</p>
    <p>It may be several paragraphs long.</p>
    <div class="tfa-article">
        <a>Full article</a>
    </div>
</div>

As long as that is the only content within the div, that will work.

You can define as the first <a> in .tfa-article, then it will work most of the time. Of course, there’s no absolutely foolproof solution, humans are very clever in finding new ways to break tools. :)

That div looks good to me - and you are right, nothing is fool-proof… but hopefully we have a decent system in place like this that will make it at least easy.

Note: we don't need the summary of the thumbnail…

Did you mean “summary or the thumbnail”? Otherwise it doesn’t make any sense for me.

You are correct, that is a typo: “summary or the thumbnail”

It also doesn’t make sense for me to mark some content machine-readably but not use it. I think if there’s a marked summary and image on the main page, the app should use that, and the article should act as a fallback in case of missing main page content.

Absolutely, if it is possible to mark them in a way that we can find them reliably, then that would be great. I was just noting that it isn't required (we don't actually use the summary or thumbnail now, so this would be an improvement!)

It's probably best to use "first link/image within <some class>" instead of being more specific, as link and image markup cannot be created by hand (links cannot get classes at all; image markup is a mess that changes wildly depending on what wikitext directives you use and is completely different in Parsoid).

@Tgr so do you think that the snippet that @Tacsipacsi pasted above will work?

Arbitrary attributes also cannot be created, everything should be prefixed with data-.

Ahh yeah, I forgot about that in the spec… easy to change.

(Another alternative is to expect data instead of real markup in the first place, so something like <div class="itn-story" data-itn-article="2016 elections in Foobaria" data-itn-image="File:Foobaria election results.png">...</div>. Or microdata, or whatever.)
I would also relax expectations about the structure, such as .itn-article being inside .itn-story

The structure is just taken from the existing HTML of the news portal - so this is how it already looks. I just added some additional classes for the most part.

(unless you actually need it to attach special behavior to links in the story).

There is special behavior we need to attach: specifically "itn-topic-article" which denotes which links in the news story are actually associated with the topic.

Mainpage content contributors shouldn't be exposed to HTML markup, so ideally it should be possible to generate the markup from a template like

{{news item|
|text=...
|image=...
|main article=...
|other article 1=...
}}

This is what I was hoping!

Tgr added a comment.EditedJan 5 2017, 12:44 AM

The problem is that it's a lot harder to use a template for this if the value of one parameter needs to be inserted in the middle of the value of another.

This is doable:

{{featured article|
|image=Foo.png
|text=Summary lorem impsum dolor sit amet

It may be several paragraphs long.
|article=Foo
}}

↓ ↓ ↓

<div class="tfa-summary">
    <a class="image tfa-thumbnail">
        <img src="/uploads/a/aa/Foo.png" />
    </a>
    <p>Summary lorem impsum dolor sit amet.</p>
    <p>It may be several paragraphs long.</p>
    <div class="tfa-article">
        <a href="/wiki/Foo">Full article</a>
    </div>
</div>

This is not:

{{news item|
|image=Foo.png
|text='''[[Foo]]''' has been elected president of [[Foobaristan]].
|article=Foo
}}

↓ ↓ ↓

<div class="tfa-summary">
    <a class="image tfa-thumbnail">
        <img src="/uploads/a/aa/Foo.png" />
    </a>
    <p><b class="tfa-article"><a href="/wiki/Foo">Foo</a></b>has been elected president of <a href="/wiki/Foobaristan">Foobaristan</a>.</p>
</div>

Ideally people should not be forced to change how the main page looks just so that MCS can understand its content.

You can work around it by using a whole host of templates:

{{news item|
|image=Foo.png
|text={{news item main link|Foo}} has been elected president of [[Foobaristan]].
|article=Foo
}}

but it would be better if that wasn't needed.

The structure is just taken from the existing HTML of the news portal - so this is how it already looks.

This is how it looks on one Wikipedia. We have 284 of them at the moment, and while some amount of template copying is going on, I would expect tens if not hundreds of significantly different markup patterns. (Also the enwiki community redesigns the main page every few years; presumably other large wikis too.)

There is special behavior we need to attach: specifically "itn-topic-article" which denotes which links in the news story are actually associated with the topic.

But do you need to know where exactly those links are in the story HTML snippet, or just what they are?

@Tgr thanks for looking over

The problem is that it's a lot harder to use a template for this if the value of one parameter needs to be inserted in the middle of the value of another.

Ok, this makes sense. I am trying to get a handle on the limitations of what is possible.

{{featured article|
|image=Foo.png
|text=Summary lorem impsum dolor sit amet

It may be several paragraphs long.
|article=Foo
}}

↓ ↓ ↓

<div class="tfa-summary">
    <a class="image tfa-thumbnail">
        <img src="/uploads/a/aa/Foo.png" />
    </a>
    <p>Summary lorem impsum dolor sit amet.</p>
    <p>It may be several paragraphs long.</p>
    <div class="tfa-article">
        <a href="/wiki/Foo">Full article</a>
    </div>
</div>

I think we are in agreement on this one. This seems about what we want to do and gives us a way to extract everything we need.

Ideally people should not be forced to change how the main page looks just so that MCS can understand its content.

Agreed… we don't want to change how it looks, just allow it to be marked up in a way that is parsable

This is how it looks on one Wikipedia. We have 284 of them at the moment, and while some amount of template copying is going on, I would expect tens if not hundreds of significantly different markup patterns. (Also the enwiki community redesigns the main page every few years; presumably other large wikis too.)

I did look through several but obviously not all. Really we are trying to create a markup that most maintainers could adopt without changing their format. The basic idea is that maintainers add a few extra divs and classes to their existing templates so that we can reliably extract the content from the HTML. So any suggestions on how to accomplish this / make it easier are welcome.

There is special behavior we need to attach: specifically "itn-topic-article" which denotes which links in the news story are actually associated with the topic.

But do you need to know where exactly those links are in the story HTML snippet, or just what they are?

We do not need to know exactly where they are, just what they are. Does this make it easier?

Tgr added a comment.Jan 5 2017, 11:30 PM

We do not need to know exactly where they are, just what they are. Does this make it easier?

Yes, it means all the non-free-text information can be provided separately outside the free-text part. That makes it more flexible.

I still wonder whether something well-defined like microdata would not be more user-friendly:

<div itemscope itemtype="https://wikimedia.org/schema/TFA">
    <div itemprop="image">
        <a class="image">
            <img src="/uploads/a/aa/Foo.png" />
        </a>
    </div>
    <div itemprop="text">
        <p>Summary lorem impsum dolor sit amet.</p>
        <p>It may be several paragraphs long.</p>
    </div>
    <div itemprop="article">
        <a href="/wiki/Foo">Full article</a>
    </div>
</div>

There are parsing libraries and online tools for microdata which would make it easier for editors to check that the markup is correct, and there is a standard way of referencing elements when the DOM structure does not align with the logical structure. (OTOH using img/a in microdata is not typical. Google does it, but online validators cannot handle it nicely.)

Waiting to discus featured feeds with @bearND before proceeding

T148680

Tgr added a comment.Feb 1 2017, 12:25 AM

I wrote an (IMO not too hard to implement) proposal of how these things should IMO ideally work: T156876: Structured data side channel for wikitext

BTW, Today's Featured Article is available for huwiki since early April. See T150806.

Putting this in the backlog for now

Fjalapeno closed this task as Resolved.Oct 24 2017, 9:11 PM
Fjalapeno claimed this task.

currently in the feed, resolving

Tacsipacsi reopened this task as Open.Oct 24 2017, 9:22 PM

News part not done.

@Fjalapeno : it's unclear whether hu.wp can get the In the News section, and how?

No, the news is unavailable we probably need top split up this ticket. Since featured article is resolved but news is not.

There are two ways to get the news for huwiki:

  1. Hacky way:

a) Add the code to MCS specifically for huwiki. For that we need to know the page that holds the ITN template.
b) Come up with a CSS selector to find the news stories on that page (avoid any instructions if any). OR

  1. Long term preferred solution:

a) Come up with a convention to make it easy to find the ITN template. This could be similar to the convention used by the FeaturedFeed extension, where special pages in the MediaWiki namespace are used.
b) Come up with markup (class names, etc.) that wikis should use to help make more robust CSS selectors, which should be used by all wikis that have ITN.

There are two ways to get the news for huwiki:

  1. Hacky way:

a) Add the code to MCS specifically for huwiki. For that we need to know the page that holds the ITN template.
b) Come up with a CSS selector to find the news stories on that page (avoid any instructions if any). OR

The news is transcluded from {{Kezdőlap aktualitásai}}, which also stores longer-term actualities (currently Brexit and the question of Catalonian independence), and recent deaths. The list has no CSS or any markup to be distinguished apart from being the only <ul> on transcludeable part of the page currently (although, as I see, neither has the English version).

  1. Long term preferred solution:

a) Come up with a convention to make it easy to find the ITN template. This could be similar to the convention used by the FeaturedFeed extension, where special pages in the MediaWiki namespace are used.
b) Come up with markup (class names, etc.) that wikis should use to help make more robust CSS selectors, which should be used by all wikis that have ITN.

I think there's no need of wasting time for a hacky version if a non-hacky solution will come in the foreseeable future (this ticket is open for nearly a year, so IMO another year is acceptable).

Tgr added a comment.Nov 15 2017, 11:56 PM

Yeah, we should prefer creating standards over creating language-specific parsing code. It's actually less effort and it's more inclusive to smaller communities who probably won't ever get enough attention to have their own code paths, but can implement the standards locally.

Does MCS have any expectations around turning the news template into a news feed? Most mainpage sections have rotating templates (daily on large wikis, weekly or something similar on smaller ones) so it's easy to create a feed of featured articles. With news there is only one template as the rotation happens on the level of the individual news items (which might spend shorter or longer time on the main page depending on how important they are). Say the news template has two items, then the next day someone removes one of them, adds a new one, and improves the wording of the one that remained. Should MCS be able to recognize that there is only one new news item for that day, and the other one is two days old (even if it's not a full text match to last day's similar news item)?

Jhernandez triaged this task as Low priority.Jul 12 2018, 12:28 PM