Page MenuHomePhabricator

Flow: Data Model JavaScript, including tokens, appears in TextExtracts and HoverCards
Closed, ResolvedPublic

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Catrope changed Security from None to Software security bug.
Restricted Application changed the visibility from "Public (No Login Required)" to "Custom Policy". · View Herald TranscriptJul 28 2015, 7:27 PM
Restricted Application changed the edit policy from "All Users" to "Custom Policy". · View Herald Transcript

This is a temporary fix to strip tokens.

After this is deployed, we'll discuss publicly. Ideas include mw.config and addInlineScript.

In T107170#1489490, @Mattflaschen wrote:

This is a temporary fix to strip tokens.

After this is deployed, we'll discuss publicly. Ideas include mw.config and addInlineScript.

LGTM

Mattflaschen-WMF renamed this task from Data Model JavaScript appears in HoverCards on Beta to Data Model JavaScript, including tokens, appears in TextExtracts and HoverCards.Jul 28 2015, 9:14 PM

The first part is deployed. Now I'm figuring out what to do about cache invalidation.

I don't think it's in parser cache (isParserCacheSupported), because that's opt in and we didn't opt in. I'm not sure about Varnish, but it definitely is in memcached.

The patch to evict it from memcached:

We'll use it once now to evict the tokens, and again later, since I think the rest of the data is problematic from a moderation standpoint.

It doesn't appear to be cached at Varnish, though I'm not really clear why. Compare:

% curl -I 'https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exchars=300&titles=Wikipedia_talk:Flow/Developer_test_page&format=json'
HTTP/1.1 200 OK
Server: nginx/1.9.3
Date: Wed, 29 Jul 2015 00:35:57 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
X-Powered-By: HHVM/3.6.5
X-Content-Type-Options: nosniff
Cache-control: private, must-revalidate, max-age=0
X-Frame-Options: SAMEORIGIN
Vary: Accept-Encoding,X-Forwarded-Proto,Cookie
X-Varnish: 1276715581, 3944614347
Via: 1.1 varnish, 1.1 varnish
Age: 0
X-Cache: cp1065 miss (0), cp1065 frontend miss (0)
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Set-Cookie: GeoIP=:::::v6; Path=/; Domain=.wikipedia.org
X-Analytics: https=1
Set-Cookie: WMF-Last-Access=29-Jul-2015;Path=/;HttpOnly;Expires=Sun, 30 Aug 2015 00:00:00 GMT
% curl -I 'https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=12518424'
HTTP/1.1 200 OK
Server: nginx/1.9.3
Date: Wed, 29 Jul 2015 00:36:00 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
X-Powered-By: HHVM/3.6.5
X-Content-Type-Options: nosniff
Cache-control: max-age=300, s-maxage=300, public
X-Frame-Options: DENY
Vary: Accept-Encoding,X-Forwarded-Proto,Cookie
Expires: Wed, 29 Jul 2015 00:38:11 GMT
Last-Modified: Fri, 26 Jun 2015 21:32:51 GMT
X-Varnish: 1625007478 1624935967, 3944622490 3944340139
Via: 1.1 varnish, 1.1 varnish
Age: 169
X-Cache: cp1066 hit (1), cp1065 frontend hit (2)
Strict-Transport-Security: max-age=31536000
Set-Cookie: GeoIP=:::::v6; Path=/; Domain=.wikimedia.org
X-Analytics: https=1
Set-Cookie: WMF-Last-Access=29-Jul-2015;Path=/;HttpOnly;Expires=Sun, 30 Aug 2015 00:00:00 GMT
In T107170#1490301, @Mattflaschen wrote:

The patch to evict it from memcached:

We'll use it once now to evict the tokens, and again later, since I think the rest of the data is problematic from a moderation standpoint.

Would it be easier to make TextExtracts refuse to give extracts for non-wikitext content model pages?

In T107170#1490354, @Mattflaschen wrote:

It doesn't appear to be cached at Varnish, though I'm not really clear why. Compare:

api.php doesn't send caching headers by default, unless you opt into them using a query parameter and tell it the caching timeout you want. I guess action=jsonschema overrides this behavior somehow, which is a relatively rare thing for API modules to do.

Would it be easier to make TextExtracts refuse to give extracts for non-wikitext content model pages?

That's a good suggestion. It should be sufficient until we do T107200: Good Flow experience with HoverCards and TextExtracts. Even then, we would only need to run the eviction if the cache key didn't change.

I'll do this as a whitelist just in case there are some other content models that it should support.

DannyH triaged this task as Unbreak Now! priority.Jul 29 2015, 11:15 PM

Disallowing non-wikitext content models:

Please review.

In T107170#1494041, @Mattflaschen wrote:

Disallowing non-wikitext content models:

Please review.

I don't think the warning is necessary and will end up being confusing if extracts are requested for multiple pages (https://en.wikipedia.org/w/api.php?action=query&titles=Main%20Page|Wikipedia|Wikimedia&prop=extracts&exchars=40&exlimit=10&exintro=1).

I'd still add a warning: giving no extract even though it was requested would also be confusing. But maybe the warning should be more clear about what page it's returning an empty extract for. LGTM otherwise.

Updated to specify which page is affected. It works if multiple pages are requested and some are supported and some are not. The warning shows up once for each non-supported page.

In T107170#1495641, @Mattflaschen wrote:

Updated to specify which page is affected. It works if multiple pages are requested and some are supported and some are not. The warning shows up once for each non-supported page.

+2

In T107170#1490301, @Mattflaschen wrote:

The patch to evict it from memcached:

We'll use it once now to evict the tokens, and again later, since I think the rest of the data is problematic from a moderation standpoint.

This is not needed now. It will only be needed later if we start enabling flow-board with the same cache key.

disallowNonWikitext_T107170_v2.patch is deployed to 1.26wmf16. My priority now is making sure both patches (but especially the TextExtracts one) either:

a. Make it into master so they get the next branch cut (simpler)
b. Get propagated properly to the 1.26wmf17.

You should now see output like https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exchars=300&titles=Wikipedia%20talk:Flow/Developer%20test%20page&format=jsonfm&callback=foo or https://www.mediawiki.org/w/api.php?action=query&prop=extracts&exchars=300&exintro=true&titles=Talk:Sandbox|User%20talk:Mattflaschen&format=jsonfm&callback=foo&exlimit=2 regardless of whether it's in cache.

The output is something like:

"Wikipedia_talk:Flow/Developer_test_page has content model 'flow-board', which is not supported; returning an empty extract."

then either an empty extract or "..." depending on TextExtracts mode.

a. Make it into master so they get the next branch cut (simpler)
b. Get propagated properly to the 1.26wmf17.

The patch should be up on tin in /srv/patches/extensions/TextExtracts (and Flow). So they'll automatically get applied to wmf17 if that branch cut happens before the patches are merged into master.

It looks like there are a few non-WMF wikis using it, and it's listed on mw.o as "stable". @demon / @MaxSem, do you have a preference if we just push the patch to master vs making this part of the next release?

Why write smart invalidation scripts in the first place when you can just update the TE cache key schema? Cache hit rate isn't that high anyway.

It looks like there are a few non-WMF wikis using it, and it's listed on mw.o as "stable". @demon / @MaxSem, do you have a preference if we just push the patch to master vs making this part of the next release?

Next release of what?

Next release of what?

Next MediaWiki release (where we'll also release OAuth patches, and patches for any bundled extensions).

I remain convinced that blocking a release for non-bundled extensions is a bad idea, so JFDI :)

a. Make it into master so they get the next branch cut (simpler)
b. Get propagated properly to the 1.26wmf17.

The patch should be up on tin in /srv/patches/extensions/TextExtracts (and Flow). So they'll automatically get applied to wmf17 if that branch cut happens before the patches are merged into master.

It is. (Would you mind checking if it's the right format to be automatically applied?)

It looks like there are a few non-WMF wikis using it, and it's listed on mw.o as "stable". @demon / @MaxSem, do you have a preference if we just push the patch to master vs making this part of the next release?

Why write smart invalidation scripts in the first place when you can just update the TE cache key schema? Cache hit rate isn't that high anyway.

We did discuss doing that (I realize it can only cache latest version anyway), but decided not to bust the cache.

It's a moot point, though, since we decided to use @Legoktm's approach instead (just not returning anything for non-wikitext). The eviction script has not been run in production, nor are we planning on doing so anytime soon.

It is. (Would you mind checking if it's the right format to be automatically applied?)

Both look good. Thanks!

We still need to decide whether to merge to master now (both (Flow, TextExtracts) patches, one, or neither).

@demon says JFDI. @csteipp, @MaxSem?

plusone

But needs an announcement afterwards.

Plus one meaning you support merging both right now?

JFDI, for merging both into master now.

Should I send the announcement? To MediaWiki-l?

@csteipp said wikitech-l is fine. We just need to decide whether to merge before or after the branch cut.

Mattflaschen-WMF changed Security from Software security bug to None.Aug 4 2015, 12:06 AM
csteipp changed the visibility from "Custom Policy" to "Public (No Login Required)".Aug 4 2015, 12:09 AM
csteipp changed the edit policy from "Custom Policy" to "All Users".

Change 229039 merged by jenkins-bot:
SECURITY: Disallow extracts for non-wikitext for now.

https://gerrit.wikimedia.org/r/229039

Change 229040 merged by jenkins-bot:
SECURITY: Strip edit tokens in mw.flow.data

https://gerrit.wikimedia.org/r/229040

Change 229048 had a related patch set uploaded (by Mattflaschen):
SECURITY: Disallow extracts for non-wikitext for now.

https://gerrit.wikimedia.org/r/229048

Change 229050 had a related patch set uploaded (by Mattflaschen):
SECURITY: Disallow extracts for non-wikitext for now.

https://gerrit.wikimedia.org/r/229050

Change 229055 had a related patch set uploaded (by Mattflaschen):
SECURITY: Disallow extracts for non-wikitext for now.

https://gerrit.wikimedia.org/r/229055

Can someone take a look at the backports? They're failing because of some package.json problem. I'm not sure if it's a known CI issue and if I should force it.

Change 229055 merged by Legoktm:
SECURITY: Disallow extracts for non-wikitext for now.

https://gerrit.wikimedia.org/r/229055

Change 229050 merged by Legoktm:
SECURITY: Disallow extracts for non-wikitext for now.

https://gerrit.wikimedia.org/r/229050

Change 229048 merged by Legoktm:
SECURITY: Disallow extracts for non-wikitext for now.

https://gerrit.wikimedia.org/r/229048

In T107170#1505202, @Mattflaschen wrote:

Can someone take a look at the backports? They're failing because of some package.json problem. I'm not sure if it's a known CI issue and if I should force it.

Force merged them.

Checked in betalabs
http://en.wikipedia.beta.wmflabs.org/w/api.php?action=query&prop=extracts&exchars=300&titles=Talk:ET1&format=json&callback=foo

/**/foo({"batchcomplete":"","warnings":{"extracts":{"*":"Talk:ET1 has content model 'flow-board', which is not supported;
 returning an empty extract."}},"query":{"pages":{"95572":{"pageid":95572,"ns":1,"title":"Talk:ET1","extract":"..."}}}})

vs a page with page_content_model='wikitext':
http://en.wikipedia.beta.wmflabs.org/w/api.php?action=query&prop=extracts&exchars=300&titles=Mavetuna22&format=json&callback=foo

/**/foo({"batchcomplete":"","query":{"pages":{"81180":{"pageid":81180,"ns":0,"title":"Mavetuna22","extract":"<h2><span 
id=\"English_as_a_global_language\">English as a global language</span></h2>\n<p>English has ceased to be an 
\"English language\" in the sense of belonging only to people who are ethnically English. Use of English is growing 
country-by-country internally and for international communication</p>..."}}}})
Mattflaschen-WMF renamed this task from Data Model JavaScript, including tokens, appears in TextExtracts and HoverCards to Flow: Data Model JavaScript, including tokens, appears in TextExtracts and HoverCards.Jun 28 2017, 7:20 PM