Page MenuHomePhabricator

Plaintext extracts could append '*' and '#' for lists
Open, LowPublic

Description

Currently, list items are simply converted to lines in plain-text extracts. Bullet lists should use *, while numbered lists should use localisable format (by default "%d.").

Source page wikitext:

This is a bullet list.
* One
* Two

This is a numbered list.
# One
# Two
# Three

Current output for plaintext extracts:

This is a bullet list.
One
Two

This is a numbered list.
One
Two
Three

Desired output:

This is a bullet list.
* One
* Two

This is a numbered list.
1. One
2. Two
3. Three

Version: master
Severity: enhancement

Details

Reference
bz57850

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:37 AM
bzimport added a project: TextExtracts.
bzimport set Reference to bz57850.
bzimport added a subscriber: Unknown Object (MLST).
MaxSem created this task.Dec 2 2013, 5:19 PM

bingle-admin wrote:

Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/mobile/cards/1472

Can we have a sample api request and sample text to make this task easier to follow?

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 16 2015, 6:26 PM

Source page wikitext:

This is a bullet list.
* One
* Two

This is a numbered list.
# One
# Two
# Three

Current output for plaintext extracts:

This is a bullet list.
One
Two

This is a numbered list.
One
Two
Three

Desired output:

This is a bullet list.
* One
* Two

This is a numbered list.
1. One
2. Two
3. Three
Jdlrobson updated the task description. (Show Details)Sep 16 2015, 6:42 PM
Jdlrobson set Security to None.

Thanks MaxSem. Is there any applications that surface this? For things like Hovercards I'm not sure lists should even show up in extracts.

Hovercards want so much different stuff that it feels like it should get its own parser. However, TE is a generic text extraction extension hevily used by third parties so it should extract everything in a way that makes more sense for general audience.

kaldari removed a subscriber: kaldari.Sep 16 2015, 10:36 PM
Jdlrobson triaged this task as Low priority.Sep 18 2015, 8:00 PM

Agreed. Given all the bugs I'm seeing I'm not sure if TextExtracts is a good fit for Hovercards...

phuedx added a subscriber: phuedx.Jun 22 2017, 12:20 PM

^ This bug doesn't strictly impact plain text previews as, like @Jdlrobson said above, lists shouldn't show up.

Jdlrobson renamed this task from Extracts should handle lists properly to Plaintext extracts could append '*' and '#' for lists.Jul 13 2017, 6:58 PM