Description
Details
Related Objects
- Mentioned In
- rETEXda9f4fe1e72c: SECURITY: Disallow extracts for non-wikitext for now.
rETEX4c6e6c77e6ae: SECURITY: Disallow extracts for non-wikitext for now.
rMEXT2717821235a5: Updated mediawiki/extensions Project: mediawiki/extensions/TextExtracts…
rETEX9e524959ff04: SECURITY: Disallow extracts for non-wikitext for now.
rETEXf5c114c571e0: SECURITY: Disallow extracts for non-wikitext for now.
rETEX63b358fca264: SECURITY: Disallow extracts for non-wikitext for now.
rEFLW106763badb5c: SECURITY: Strip edit tokens in mw.flow.data
rMEXT0036fa674867: Updated mediawiki/extensions Project: mediawiki/extensions/TextExtracts…
rMEXTa7e6fd075f13: Updated mediawiki/extensions Project: mediawiki/extensions/Flow…
rEFLW1fa6a191b03b: SECURITY: Strip edit tokens in mw.flow.data
rMW9b14ded41d3c: Updated mediawiki/core Project: mediawiki/extensions/Flow…
rMW3f56559b9c96: Updated mediawiki/core Project: mediawiki/extensions/TextExtracts…
T107206: TextExtracts should strip scripts - Mentioned Here
- T107200: Good Flow experience with HoverCards and TextExtracts
T105924: Graph extension should exclude scripts tags so they will not be parsed by TextExtracts
Event Timeline
This is a temporary fix to strip tokens.
After this is deployed, we'll discuss publicly. Ideas include mw.config and addInlineScript.
The first part is deployed. Now I'm figuring out what to do about cache invalidation.
I don't think it's in parser cache (isParserCacheSupported), because that's opt in and we didn't opt in. I'm not sure about Varnish, but it definitely is in memcached.
The patch to evict it from memcached:
We'll use it once now to evict the tokens, and again later, since I think the rest of the data is problematic from a moderation standpoint.
It doesn't appear to be cached at Varnish, though I'm not really clear why. Compare:
% curl -I 'https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exchars=300&titles=Wikipedia_talk:Flow/Developer_test_page&format=json' HTTP/1.1 200 OK Server: nginx/1.9.3 Date: Wed, 29 Jul 2015 00:35:57 GMT Content-Type: application/json; charset=utf-8 Connection: keep-alive X-Powered-By: HHVM/3.6.5 X-Content-Type-Options: nosniff Cache-control: private, must-revalidate, max-age=0 X-Frame-Options: SAMEORIGIN Vary: Accept-Encoding,X-Forwarded-Proto,Cookie X-Varnish: 1276715581, 3944614347 Via: 1.1 varnish, 1.1 varnish Age: 0 X-Cache: cp1065 miss (0), cp1065 frontend miss (0) Strict-Transport-Security: max-age=31536000; includeSubDomains; preload Set-Cookie: GeoIP=:::::v6; Path=/; Domain=.wikipedia.org X-Analytics: https=1 Set-Cookie: WMF-Last-Access=29-Jul-2015;Path=/;HttpOnly;Expires=Sun, 30 Aug 2015 00:00:00 GMT
% curl -I 'https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=12518424' HTTP/1.1 200 OK Server: nginx/1.9.3 Date: Wed, 29 Jul 2015 00:36:00 GMT Content-Type: application/json; charset=utf-8 Connection: keep-alive X-Powered-By: HHVM/3.6.5 X-Content-Type-Options: nosniff Cache-control: max-age=300, s-maxage=300, public X-Frame-Options: DENY Vary: Accept-Encoding,X-Forwarded-Proto,Cookie Expires: Wed, 29 Jul 2015 00:38:11 GMT Last-Modified: Fri, 26 Jun 2015 21:32:51 GMT X-Varnish: 1625007478 1624935967, 3944622490 3944340139 Via: 1.1 varnish, 1.1 varnish Age: 169 X-Cache: cp1066 hit (1), cp1065 frontend hit (2) Strict-Transport-Security: max-age=31536000 Set-Cookie: GeoIP=:::::v6; Path=/; Domain=.wikimedia.org X-Analytics: https=1 Set-Cookie: WMF-Last-Access=29-Jul-2015;Path=/;HttpOnly;Expires=Sun, 30 Aug 2015 00:00:00 GMT
Would it be easier to make TextExtracts refuse to give extracts for non-wikitext content model pages?
api.php doesn't send caching headers by default, unless you opt into them using a query parameter and tell it the caching timeout you want. I guess action=jsonschema overrides this behavior somehow, which is a relatively rare thing for API modules to do.
That's a good suggestion. It should be sufficient until we do T107200: Good Flow experience with HoverCards and TextExtracts. Even then, we would only need to run the eviction if the cache key didn't change.
I'll do this as a whitelist just in case there are some other content models that it should support.
I don't think the warning is necessary and will end up being confusing if extracts are requested for multiple pages (https://en.wikipedia.org/w/api.php?action=query&titles=Main%20Page|Wikipedia|Wikimedia&prop=extracts&exchars=40&exlimit=10&exintro=1).
I'd still add a warning: giving no extract even though it was requested would also be confusing. But maybe the warning should be more clear about what page it's returning an empty extract for. LGTM otherwise.
Updated to specify which page is affected. It works if multiple pages are requested and some are supported and some are not. The warning shows up once for each non-supported page.
This is not needed now. It will only be needed later if we start enabling flow-board with the same cache key.
disallowNonWikitext_T107170_v2.patch is deployed to 1.26wmf16. My priority now is making sure both patches (but especially the TextExtracts one) either:
a. Make it into master so they get the next branch cut (simpler)
b. Get propagated properly to the 1.26wmf17.
You should now see output like https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exchars=300&titles=Wikipedia%20talk:Flow/Developer%20test%20page&format=jsonfm&callback=foo or https://www.mediawiki.org/w/api.php?action=query&prop=extracts&exchars=300&exintro=true&titles=Talk:Sandbox|User%20talk:Mattflaschen&format=jsonfm&callback=foo&exlimit=2 regardless of whether it's in cache.
The output is something like:
"Wikipedia_talk:Flow/Developer_test_page has content model 'flow-board', which is not supported; returning an empty extract."
then either an empty extract or "..." depending on TextExtracts mode.
a. Make it into master so they get the next branch cut (simpler)
b. Get propagated properly to the 1.26wmf17.
The patch should be up on tin in /srv/patches/extensions/TextExtracts (and Flow). So they'll automatically get applied to wmf17 if that branch cut happens before the patches are merged into master.
It looks like there are a few non-WMF wikis using it, and it's listed on mw.o as "stable". @demon / @MaxSem, do you have a preference if we just push the patch to master vs making this part of the next release?
Why write smart invalidation scripts in the first place when you can just update the TE cache key schema? Cache hit rate isn't that high anyway.
Next MediaWiki release (where we'll also release OAuth patches, and patches for any bundled extensions).
I remain convinced that blocking a release for non-bundled extensions is a bad idea, so JFDI :)
It is. (Would you mind checking if it's the right format to be automatically applied?)
It looks like there are a few non-WMF wikis using it, and it's listed on mw.o as "stable". @demon / @MaxSem, do you have a preference if we just push the patch to master vs making this part of the next release?
We did discuss doing that (I realize it can only cache latest version anyway), but decided not to bust the cache.
It's a moot point, though, since we decided to use @Legoktm's approach instead (just not returning anything for non-wikitext). The eviction script has not been run in production, nor are we planning on doing so anytime soon.
It is. (Would you mind checking if it's the right format to be automatically applied?)
Both look good. Thanks!
@csteipp said wikitech-l is fine. We just need to decide whether to merge before or after the branch cut.
Change 229039 merged by jenkins-bot:
SECURITY: Disallow extracts for non-wikitext for now.
Change 229048 had a related patch set uploaded (by Mattflaschen):
SECURITY: Disallow extracts for non-wikitext for now.
Change 229050 had a related patch set uploaded (by Mattflaschen):
SECURITY: Disallow extracts for non-wikitext for now.
Change 229055 had a related patch set uploaded (by Mattflaschen):
SECURITY: Disallow extracts for non-wikitext for now.
Can someone take a look at the backports? They're failing because of some package.json problem. I'm not sure if it's a known CI issue and if I should force it.
Change 229055 merged by Legoktm:
SECURITY: Disallow extracts for non-wikitext for now.
Change 229050 merged by Legoktm:
SECURITY: Disallow extracts for non-wikitext for now.
Change 229048 merged by Legoktm:
SECURITY: Disallow extracts for non-wikitext for now.
Checked in betalabs
http://en.wikipedia.beta.wmflabs.org/w/api.php?action=query&prop=extracts&exchars=300&titles=Talk:ET1&format=json&callback=foo
/**/foo({"batchcomplete":"","warnings":{"extracts":{"*":"Talk:ET1 has content model 'flow-board', which is not supported; returning an empty extract."}},"query":{"pages":{"95572":{"pageid":95572,"ns":1,"title":"Talk:ET1","extract":"..."}}}})
vs a page with page_content_model='wikitext':
http://en.wikipedia.beta.wmflabs.org/w/api.php?action=query&prop=extracts&exchars=300&titles=Mavetuna22&format=json&callback=foo
/**/foo({"batchcomplete":"","query":{"pages":{"81180":{"pageid":81180,"ns":0,"title":"Mavetuna22","extract":"<h2><span id=\"English_as_a_global_language\">English as a global language</span></h2>\n<p>English has ceased to be an \"English language\" in the sense of belonging only to people who are ethnically English. Use of English is growing country-by-country internally and for international communication</p>..."}}}})