Page MenuHomePhabricator

RfC: HTML templating library (backend and frontend)
Closed, ResolvedPublic

Description

We need to agree on Requests for comment/HTML templating library. Let's use this task to follow the status of the RfC process and to identify blockers.

Details

Reference
fl572
Related Gerrit Patches:

Event Timeline

flimport raised the priority of this task from to Needs Triage.Sep 12 2014, 1:46 AM
flimport added a project: Architecture.
flimport set Reference to fl572.
Qgil triaged this task as Low priority.Sep 21 2014, 2:08 PM
Qgil added a subscriber: Qgil.
Qgil edited projects, added TechCom-RFC; removed Architecture.Oct 22 2014, 8:45 PM
tstarling set Security to None.
tstarling moved this task from Inbox to Decided on the TechCom board.Nov 20 2014, 6:25 AM

Change 178385 had a related patch set uploaded (by Kaldari):
Adding mustache.php to vendor repo for security review

https://gerrit.wikimedia.org/r/178385

Patch-For-Review

Change 178385 abandoned by Kaldari:
Adding mustache.php to vendor repo for security review

Reason:
A bit premature. Looks like Lightncandy would be a better choice.

https://gerrit.wikimedia.org/r/178385

Kaldari's https://gerrit.wikimedia.org/r/#/c/181708/ patch added lightncandy to core/vendor. There's no documentation or mention in RELEASE-NOTES-1.25.

I briefly discussed with @TrevorParscal and @Catrope and have a few thoughts. Based on I28cd13d4d11 as demonstration of how LightnCandy could be used in MediaWiki, and a quick scan of upstream zordius/lightncandy.

LightnCandy was brought to the Front-end Standards Group as candidate for interacting with Mustache files in PHP (recommended by Shahyar, based on Flow's use of it). The HTML templating RFC has not been on the agenda since that initial pitch. I think there's been a general vibe indicating we're ready to start using it, but (as far as I know) FSG did not actually look into LightnCandy yet. Sorry about the confusion around this.

MediaWiki integration

  • Committing compiled PHP files to Git is not an option (for maintainability and overall sanity; a good system ought not to require that).
  • The external library should be wrapped in a (thin) class to ensure we have a manageable API surface that allows for the implementation to change in the future. Exposing and depending only on standard Mustache features.

LightnCandy

General

The library seems relatively immature. With a small community. At glance, the code base appears of suboptimal code quality. The small number of immediate issues we found seem easily fixed, however the nature of these issues worry me. As well as some aspects of how PHP is used (e.g. use of htmlentities). The lead maintainer is quite active and responsive, but I don't see many other contributors.

It provides non-standard Mustache features we agreed not to use (as well as support for Handlebars). It has a large number of configuration flags. Including a mode that doesn't expose these non-standard features, which is nice. It providing a superset of our requirements is obviously not a criticism. But it's worth noting these features may have influenced the Flow teams' choice at the time. Given we no longer use these features, and we're now looking for primetime use in MediaWiki, our requirements may call for a different choice.

It claims great performance. We should measure the actual performance using:

  • the software stack we use (HHVM),
  • test cases relevant to us,
  • appropriate context (e.g. a single web request handling lots of templates, representative of how MediaWiki with extensions might be),
  • the restricted configuration flags (may affect performance),
  • with and without caching.
Implementation

Looking at the surface I see a (potentially slow) parser that hides itself by producing PHP code (in a build step not measured) that inlines all necessary PHP statements into one long function written to a PHP file that is subsequently included (require) to call that function.

Looking deeper I do see a context object and use of tokens, but I haven't found a method that doesn't involve producing PHP code or evaluating it. The example in the README calls LightnCandy::compile() (returns a string of PHP code) and LightnCandy::prepare() (tries to write to /tmp, falling back to require() with data://text/plain,{urlencoded-phpcode}).

Next steps

If sticking with LightnCandy, at the very least we should:

  1. Measure actual performance using our software stack and configuration flags.
  2. Create (thin) wrapper class to cache php code as string in a standard ObjectCache (e.g. $wgMemc). This also fixes cache lingering (T89122).
  3. Lazy-populate cache (if the parser is fast enough to be run inside a web request; similar to ResourceLoader). Otherwise we'd have to do it prior to each deployment with a maintenance script (like for localisation cache).
  4. Fix LightnCandy to have a render method that will not write to /tmp (regardless of feature detection). Presumably this method would use the data-URI way instead.

Alternative

The current benchmarks use the require method (not data-URI evaluation). Our performance measuring should take the data-URI path.

If it turns out the parser is too slow to run in a web request (thus warranting building at deployment time), we should re-consider our options to verify whether LightnCandy still makes sense for us as there wouldn't be much reason to stick with it.

Looking at other options, I'd expect a Mustache parser to cache pure data (e.g. storing its context and token tree as an iterable structure). At run time, it merely retrieves the object from cache, iterates through the tokens to call relevant methods and substitute user input; and return a string of html. As example, mustache.js works this way. There'd be no code generation, eval(), data-uri evaluation, or other context switching.

I've written a new class called TemplateParser based on Krinkle's suggestions above. It provides a server-side interface to dynamically-compiled Mustache templates. The output PHP code gets cached twice: first in memcached (for 1 hour) and then also within the TemplateParser instance.

The only down-side to this implementation is that it requires eval(), but Chris said that he's OK with this as long as it is only being used on compiled code that has been generated from code-reviewed templates. The only alternative to using eval() is using pre-compiled templates, which Krinkle says is not an option.

I've also converted NoLocalSettings.php to use TemplateParser as a proof of concept.

Code review, security review, and general feedback would be appreciated: https://gerrit.wikimedia.org/r/#/c/187728/

When mobile first used lightncandy, they were using (or had talked about using) a deployment step like i18n to compile the templates. That keeps it out of gerrit, but prevents using eval() and a relatively untrusted store for the cache. Is there a reason that option isn't being considered?

Qgil removed a subscriber: Qgil.Feb 11 2015, 7:40 PM

@csteipp: That would probably be a good solution, but there are two issues:

  1. I have no idea how that would work for 3rd party users. I guess we could also make template compilation part of the packaging process and bundle some tools for managing them.
  2. I would not personally be able to build such a deployment process by myself. First, I doubt I have adequate permissions to muck with such stuff on tin. Second, my knowledge of how deployment and syncing actually works (technically) is quite limited. I would probably have to get some of Ori or Aaron's time to help with planning how it should work and actually implementing it.

If we want to go with compile-at-deploy (instead of lazy-cache), then it'd probably work like this:

  • For third-party users we can default to populating cache on-demand for individual templates (if a cache is configured). And no cache at all otherwise. This is also how localisation cache works. And also what @kaldari's current patch does.
  • In addition, we'd have a MediaWiki configuration flag to ensure no compilation is done at runtime (manualRecache=true). A maintenance script would iterate over registered templates and cache them. This means templates should be registered (like resource modules already), which seems like a good abstraction layer (instead of referencing file paths directly in code).

The additional compile-at-deploy route may not be necessary though. This route would only be needed if the parser is too slow to populate cache on-demand. Perhaps Flow can help us measure how long it would take to have them compiled at run time for a large page.

@csteipp: Regarding the "relatively untrusted store", would using APC instead of memcached help? I understand that it it is harder to attack since it isn't shared between the Apaches.

csteipp added a subscriber: aaron.Feb 12 2015, 5:57 PM

@csteipp: Regarding the "relatively untrusted store", would using APC instead of memcached help? I understand that it it is harder to attack since it isn't shared between the Apaches.

That would be better, although I think hhvm dropped object storage in APC (but @aaron should confirm that..).

If you go the integrity check route, it would be fairly simple to ensure that whoever put the code in memcache had at least read access to our private repo (which isn't the same as having write access to the appserver filesystem, but it's a step up from just having network access on the cluster). In TemplateParser.php, doing something like:

$code = $cache->get( $key );
global $some_secret; // maybe $wgSecretKey

if ( !$code ) {
  // do parsing to generate $code
  ...
  $code = hash_hmac( 'sha256', $code, $some_secret ) . $code; // prefix with 64 hex chars
  $cache->set( $key, $code);
}

$hmac = substr( $code, 0, 64 );
$code = substr( $code, 64 );
if ( $hmac === hash_hmac( 'sha256', $code, $some_secret ) ) {
  eval( $code );
} else {
  throw exception;
}

That would be better, although I think hhvm dropped object storage in APC (but @aaron should confirm that..).

I was wrong, it still has object storage. So yeah, that would make me more comfortable with this.

I wouldn't mind if you also did the integrity check, but if you're using APC, I'll try to add that as a generic layer in core and we can update this to use it when it's ready.

Change 187728 had a related patch set uploaded (by Kaldari):
Adding TemplateParser class providing interface to Mustache templates

https://gerrit.wikimedia.org/r/187728

Patch-For-Review

Change 187728 merged by jenkins-bot:
Adding TemplateParser class providing interface to Mustache templates

https://gerrit.wikimedia.org/r/187728

The server-side implementation is done (for now). Client-side implementation seems to be stalled: https://gerrit.wikimedia.org/r/#/c/180647/

daniel renamed this task from RfC: HTML templating library to RfC: HTML templating library (backend done, frontend stalled).Feb 27 2015, 3:37 PM
daniel moved this task from P1: Define to Under discussion on the TechCom-RFC board.
daniel moved this task from Under discussion to (unused 2) on the TechCom-RFC board.

https://gerrit.wikimedia.org/r/#/c/180647/ is merged, so can you call this resolved? Documentation ( HTML templates, also phpdoc/jsduck ?) needs updating writing, create a subtask of this if you like

Jdlrobson closed this task as Resolved.Mar 25 2015, 6:06 PM

Yes. Thanks to all involved! Templating is now in core in all the required forms.

If we want to open more specific RFCs around templating however feel free.

Spage moved this task from (unused 2) to Implemented on the TechCom-RFC board.Mar 26 2015, 1:01 AM
awight added a subscriber: awight.Apr 14 2015, 8:11 AM

TemplateParser looks great, and I'm thrilled to see a templating library made available! Would someone would please point me to the discussion that resulted in standardizing on Mustache rather than Handlebars? I've skimmed the RFC, and there is a discussion started by @Nuria which seems to lean heavily towards Handlebars, which is a superset of Mustache. Yet, above in this comment, a requirement is mentioned, that we want to limit ourselves to:

Exposing and depending only on standard Mustache features.

Was this decision possibly an accident?

Was this decision possibly an accident?

No, you can read the IRC log. I voted for mustache over handlebars because I felt like it was harder for developers to do unsafe things in mustache. That said, if developers are going to do their own extending of the template system to get those features, I'd rather use handlebars over rolling out own.

/me facepalms.

Thanks for the pointer, IRC answers my questions.

21:08:09 <ebernhardson> i'm really not interested in rewriting a bunch of things in php and javascript, and having some semblance of logic in the template makes alot of that possible

+1 ^^. But we can always revisit the decision if it turns out the lack of logic is tying our thumbs together.

Spage renamed this task from RfC: HTML templating library (backend done, frontend stalled) to RfC: HTML templating library (backend and frontend).Sep 18 2015, 8:58 PM
Krinkle edited projects, added TechCom-RFC (TechCom-RFC-Closed); removed TechCom-RFC.
Krinkle moved this task from Untriaged to Implemented on the TechCom-RFC (TechCom-RFC-Closed) board.