Page MenuHomePhabricator

TextExtracts should strip scripts
Closed, ResolvedPublic

Description

Arguably they should not be there to begin with, but this has affected at least two extensions, so probably worth dealing with here:

T105924: Graph extension should exclude scripts tags so they will not be parsed by TextExtracts

T107170: Flow: Data Model JavaScript, including tokens, appears in TextExtracts and HoverCards

It's not always a security issue, but always a UX one.

Event Timeline

Mattflaschen-WMF raised the priority of this task from to Needs Triage.
Mattflaschen-WMF updated the task description. (Show Details)
Restricted Application changed the visibility from "Public (No Login Required)" to "Custom Policy". · View Herald TranscriptJul 28 2015, 8:41 PM
Restricted Application changed the edit policy from "All Users" to "Custom Policy". · View Herald Transcript
Restricted Application added a project: acl*security. · View Herald Transcript
Mattflaschen-WMF renamed this task from TextExtracts should script scripts to TextExtracts should strip scripts.Jul 28 2015, 8:45 PM

Only for Flow, not T105924: Graph extension should exclude scripts tags so they will not be parsed by TextExtracts if I understand correctly that the "Graphs" page (see F192692) is wikitext content model. Also not the general issue.

@Mattflaschen-WMF, would you be willing to take a pass at a patch?

Just found this on my travels. Is this still a problem? Is it really low? Is this a problem with the Graph extension or TextExtracts or both?

Jdlrobson raised the priority of this task from Low to High.Jun 23 2017, 6:12 PM
Jdlrobson moved this task from Needs Prioritization to Upcoming on the Web-Team-Backlog board.
Jdlrobson added subscribers: dpatrick, Bawolff.

I can't replicate this in the wild, but I see how this is happening.
This relates to T167852 but is much more easily fixed. We need to update wgto include script tags so they get stripped from the extract.

My plan is to:

  1. Update on production:
  2. Update in extension (change default - will submit and merge patch after SWAT)

@Bawolff @dpatrick that sound like a plan?

Postponed from today's swat, pending some discussion of how to not do it via InitialiseSettings

Postponed from today's swat, pending some discussion of how to not do it via InitialiseSettings

Where did that discussion take place?

Postponed from today's swat, pending some discussion of how to not do it via InitialiseSettings

Where did that discussion take place?

It hasn't yet, I postponed it because we need a discussion about how to do it instead :)

Mibad. I completely missed the "pending".

👍

@Mattflaschen-WMF, would you be willing to take a pass at a patch?

Sorry, didn't see this (only happened to see this patch on the Phabricator web interface). I'm not checking Phabricator mail as much these days, so IRC/direct email is better for something like that.

I can't replicate this in the wild, but I see how this is happening.
This relates to T167852 but is much more easily fixed. We need to update wgto include script tags so they get stripped from the extract.

My plan is to:

  1. Update on production:
  2. Update in extension (change default - will submit and merge patch after SWAT)

@Bawolff @dpatrick that sound like a plan?

My understanding is that this is theoretical, and we're not currently aware of any examples of WMF extensions that would cause this?

I think stripping <script> tags is generally a good idea and similar to how the extension already removes <style>. To be really paranoid, I would add <input> (If an extension added a form to a wikipage, there could be a hidden input tag containing a token).

To be further paranoid, we may want ExtractFormatter::filterContent() to also remove any attributes starting with data or on. (data in case they contain sensitive data. on for scripts)


Instead of InitialiseSettings.php, I would suggest just adding the additional items to extension.json directly, and deploying that. I think its fine to do this as a public gerrit patch, provided the patch is merged pretty immediately after the patch is made public, and an announcement email is sent to wikitech-l/mediawiki-l advising people to upgrade. If for whatever reason we can't do the public announcement immediately after deploying, then it should be deployed as a secret security patch until we can do that announcement.

Going to swat this in 7 minutes time and send an e-mail to wikitech-l & mediawiki-l advising 3rd parties to upgrade.

@Bawolff can you make this public?

The following fix has been deployed: https://gerrit.wikimedia.org/r/362252
I've sent an email to mailing lists.

Bawolff changed the visibility from "Custom Policy" to "Public (No Login Required)".Jun 29 2017, 8:24 PM
Bawolff changed the edit policy from "Custom Policy" to "All Users".

@Bawolff can you make this public?

Done

Bawolff assigned this task to Jdlrobson.

Change 362390 had a related patch set uploaded (by Brian Wolff; owner: Jdlrobson):
[mediawiki/extensions/TextExtracts@REL1_29] Play things safe when stripping HTML

https://gerrit.wikimedia.org/r/362390

Change 362391 had a related patch set uploaded (by Brian Wolff; owner: Jdlrobson):
[mediawiki/extensions/TextExtracts@REL1_28] Play things safe when stripping HTML

https://gerrit.wikimedia.org/r/362391

Change 362392 had a related patch set uploaded (by Brian Wolff; owner: Jdlrobson):
[mediawiki/extensions/TextExtracts@REL1_27] Play things safe when stripping HTML

https://gerrit.wikimedia.org/r/362392

Change 362390 merged by jenkins-bot:
[mediawiki/extensions/TextExtracts@REL1_29] Play things safe when stripping HTML

https://gerrit.wikimedia.org/r/362390

Change 362392 merged by jenkins-bot:
[mediawiki/extensions/TextExtracts@REL1_27] Play things safe when stripping HTML

https://gerrit.wikimedia.org/r/362392

Change 362391 merged by jenkins-bot:
[mediawiki/extensions/TextExtracts@REL1_28] Play things safe when stripping HTML

https://gerrit.wikimedia.org/r/362391