Extracts API: Extracts strips lang attributes from html by flattening the span elements
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	TheDJ
	Nov 26 2013, 10:14 AM

Description

I really love the extracts feature, but I noted that currently all span elements are flattened out of the cleaned up HTML.

But one of the biggest usages of span tags is to mark different languages and script directions using the attributes dir and lang. These different languages are quite often present in the first line of an article on a non-english topic. I think those are thus very important elements to preserve in our multilingual content.

Version: master
Severity: normal

Details

Reference: bz57582

	Subject	Repo	Branch	Lines +/-
	Don't flatten spans	mediawiki/extensions/TextExtracts	master	+38 -1

Customize query in gerrit

Related Objects

Mentioned In: rMEXT17bbf4a9d998: Updated mediawiki/extensions Project: mediawiki/extensions/TextExtracts…
rETEX59633e2be9d9: Don't flatten spans

Event Timeline

• bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:19 AM

• bzimport added a project: TextExtracts.

• bzimport set Reference to bz57582.

• bzimport added a subscriber: Unknown Object (MLST).

TheDJ created this task.Nov 26 2013, 10:14 AM

bingle-admin wrote:

Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/mobile/cards/1454

Can you provide an example of real-life breakages caused by this removal?

Font selection for the bengali language article extract probably fails for many people in this result: https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exintro&titles=Bengali_language&format=jsonfm

There is no indication another font needs to be used for this fragment, so only glyph fallback can save you. Voice software also won't know when to select a different voice.

You could make a similar argument for the font-family css style attribute that ULS depends on for IPA for instance. But since ULS can also use language attributes, I think those are a tad more important.

bingle-admin wrote:

Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/mobile/cards/1479

• Awjrichards unsubscribed.Dec 3 2014, 5:44 PM

Change 183496 had a related patch set uploaded (by Phuedx):
Don't flatten spans

https://gerrit.wikimedia.org/r/183496

Patch-For-Review

@MaxSem is it really as simple as 183496? What have I missed?