Page MenuHomePhabricator

Add an option to obtain wikitext excerpts
Open, LowPublic

Description

It would be useful for certain modules and gadgets to have the ability to obtain an excerpt in wikitext format (so it can be pushed back to {{#invoke:}} callers.

Event Timeline

Jdlrobson subscribed.

What would a wikitext extract look like? Number of characters? Consider a wikipage with a large infobox template invocation at the beginning... what would you respect it to return?

@Jdlrobson, we should apply the same algo as for HTML extracts (i.e. no templates, markings do not count towards the number of characters from what I can determine). The idea here, if I get it right, is to obtain a short overview of the article to display in search results and articles list.

That's exactly how we display the featured articles on the front page on the Romanian Wikipedia - we get the wikitext of section 0, then try to keep the formatted text only. We now do this using text parsing in LUA, it would be great if it could be done somehow using this extension instead.

Jdlrobson added a project: patch-welcome.

Since it's a new feature, setting to low for time being. Web team unlikely to get a chance to work on this any time soon as we're focusing on improving the existing HTML output of TextExtracts (T113094), but I agree this would be cool and useful.

Just to clarify I fully understand - are you asking for text or wikitext markup?

https://en.m.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=extracts&titles=San+Francisco&explaintext=1 seems to do a good job of getting the text.

Could you clarify the end result - what do you want to render?

Wikitext, since I need the markup. You can see what I'm trying to achieve at https://ro.wikipedia.org/wiki/Pagina_principal%C4%83 , under the heading "Conținut recomandat". It's the introduction of featured articles, processed using Lua in a very similar way to this extension. The same code is used in several portals to display the article of the day. These are not "classical" extracts, but longer ones, up to several paragraphs, which makes it dull to display without any links to other articles.

Having TextExtracts expose a Lua interface would make the code far more maintainable and easier to maintain.

@Strainu struggling to see what you are doing (I'm not that comfortable with templates and the romanian language :)). Is it possible for you to throw up some examples in your sandbox to help define what this would look like?

e.g.

After thinking about this a little, this feels like its out of the scope of TextExtracts (which is also in maintenance mode) and might be better done in another extension.

Implementation wise it would make sense to take the output of the prop=revisions and then manipulate the content... e.g. remove any templates, possibly any ref tags.

The following code would get you partially to where you'd want:

$.ajax('/w/api.php?action=query&format=json&prop=revisions&titles=Utilizator%3AStrainu%2FExample1&rvprop=content&formatversion=2')
.then((json)=>{
var content = json.query.pages[0].revisions[0].content;
content= content.substr(0, content.indexOf('\n=') ).replace(/\{\{.+\\}}/g,'');
console.log( content );
return content;
})

Might be worth setting up a new extension if this is useful to encapsulate somewhere and helps various use cases. I'd need those understand the use cases a little more however...

The WikipediaExtracts extension does this (although that probably doesn't help Strainu since it is not deployed on Wikimedia sites).

@Tgr, is WikipediaExtracts deployable or it needs more work?

@Tgr, is WikipediaExtracts deployable or it needs more work?

A lot. Probably way easier to add the functionality to TextExtracts.

Obtaining a wikitext excerpt is already possible in Lua; see Module:Excerpt on English Wikipedia

@Evad37 : thanks, I wrote a similar module myself last year, but there are two problems with such a module: it needs to be regularly maintained by adding new templates to the exclusion list and it fragments the implementation between wikis potentially leading to slightly different results. Using the same codebase throughout Wikimedia sites seems preferable.

Masumrezarock100 added a subscriber: ovasileva.
Masumrezarock100 subscribed.

Assigning to @ovasileva since she is the product owner of reading web team.

I'm actively working on the Excerpt and Transcluder modules, making them more powerful and abstract so as to allow for identical reuse on every wiki.