Maniphest T195845

Media: Handle HTML entities in plaintext descriptions
Closed, DeclinedPublic
Actions

Assigned To

Authored By

	• Mholloway
	May 29 2018, 1:57 PM

Description

Looks like we'll need to decode entities prior to the striptags pass in generating the plaintext file descriptions.

Report (https://github.com/wikimedia/wikipedia-ios/pull/2310#issuecomment-391008942):

@mdholloway I ran across this description while testing:
&lt;a href=\"<a rel=\"nofollow\" class=\"external free\" href=\"http://www.archivessearch.qld.gov.au/Image/DigitalImageDetails.aspx?ImageId=7998\">http://www.archivessearch.qld.gov.au/Image/DigitalImageDetails.aspx?ImageId=7998</a>\" rel=\"nofollow\"&gt;Queensland State Archives Digital Image ID 7998&lt;/a&gt;
via https://en.wikipedia.org/api/rest_v1/page/media/Diana,_Princess_of_Wales

Then I tried:
> striptags("&lt;a href=\"<a rel=\"nofollow\" class=\"external free\" href=\"http://www.archivessearch.qld.gov.au/Image/DigitalImageDetails.aspx?ImageId=7998\">http://www.archivessearch.qld.gov.au/Image/DigitalImageDetails.aspx?ImageId=7998</a>\" rel=\"nofollow\"&gt;Queensland State Archives Digital Image ID 7998&lt;/a&gt;")
and got
&lt;a href="http://www.archivessearch.qld.gov.au/Image/DigitalImageDetails.aspx?ImageId=7998" rel="nofollow"&gt;Queensland State Archives Digital Image ID 7998&lt;/a&gt;
I don't think we can do much about them explicitly encoding HTML tags into the plaintext description using entities (multiple passes of striptags?), but might make sense to at least expand the HTML entities to get <a href="http://www.archivessearch.qld.gov.au/Image/DigitalImageDetails.aspx?ImageId=7998" rel="nofollow">Queensland State Archives Digital Image ID 7998</a>

Details

	Subject	Repo	Branch	Lines +/-
	Media: Decode HTML entities in commons metadata before stripping tags	mediawiki/services/mobileapps	master	+22 -2

Customize query in gerrit

Event Timeline

• Mholloway created this task.May 29 2018, 1:57 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 29 2018, 1:57 PM

• Mholloway updated the task description. (Show Details)May 29 2018, 2:00 PM

• Mholloway claimed this task.May 29 2018, 4:57 PM

• Mholloway moved this task from To Do to Doing on the Product-Infrastructure-Team-Backlog-Deprecated (Kanban) board.

Change 436281 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Media: Decode HTML entities in commons metadata before stripping tags

https://gerrit.wikimedia.org/r/436281

gerritbot added a project: Patch-For-Review.May 30 2018, 12:30 PM

• Mholloway moved this task from Doing to Code Review on the Product-Infrastructure-Team-Backlog-Deprecated (Kanban) board.May 30 2018, 12:32 PM

Holding this for discussion at next RI weekly meeting.

We discussed and decided this looks like an editor error, and we shouldn't implement anything for it.

Change 436281 abandoned by Mholloway:
Media: Decode HTML entities in commons metadata before stripping tags

Reason:
We discussed and decided this looks like an editor error, and we shouldn't implement anything for it.

https://gerrit.wikimedia.org/r/436281

• Mholloway closed this task as Declined.Jun 11 2018, 5:07 PM

• Vvjjkkii renamed this task from Media: Handle HTML entities in plaintext descriptions to 62baaaaaaa.Jul 1 2018, 1:07 AM

• Vvjjkkii reopened this task as Open.

• Vvjjkkii removed • Mholloway as the assignee of this task.

• Vvjjkkii triaged this task as High priority.

• Vvjjkkii added projects: CheckUser, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), Tamil-Sites, Gamepress, Hashtags, Jade, KartoEditor, Language-2018-Apr-June, New-Editor-Experiences, Mail, TCB-Team (now WMDE-TechWish).

• Vvjjkkii updated the task description. (Show Details)

• Vvjjkkii removed subscribers: gerritbot, Aklapper.

CommunityTechBot renamed this task from 62baaaaaaa to Media: Handle HTML entities in plaintext descriptions.Jul 2 2018, 3:26 PM

CommunityTechBot closed this task as Declined.

CommunityTechBot assigned this task to • Mholloway.

CommunityTechBot raised the priority of this task from High to Needs Triage.

CommunityTechBot updated the task description. (Show Details)

CommunityTechBot removed projects: TCB-Team (now WMDE-TechWish), Mail, New-Editor-Experiences, Language-2018-Apr-June, KartoEditor, Jade, Hashtags, Gamepress, Tamil-Sites, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), CheckUser.

CommunityTechBot added subscribers: gerritbot, Aklapper.

Media: Handle HTML entities in plaintext descriptionsClosed, DeclinedPublicActions

Description

Details

Event Timeline

Media: Handle HTML entities in plaintext descriptions
Closed, DeclinedPublic
Actions