Page MenuHomePhabricator

Gitiles does not display <pre> blocks in repository descriptions
Closed, DeclinedPublic

Description

Most Markdown readme files were converted from ```lang=... to <pre lang="..."> as Phabricator only understands the latter. We have now abandoned the idea of using Phabricator as a repo / code review tool, and are using Gerrit + Gitiles instead; and Gitiles apparently does not understand the <pre> notation (compare https://gerrit.wikimedia.org/g/testing-access-wrapper/ with https://github.com/wikimedia/testing-access-wrapper/blob/master/README.md ).

Event Timeline

Quiddity renamed this task from Gititles does not display <pre> blocks in repository descriptions to Gitiles does not display <pre> blocks in repository descriptions.May 1 2018, 4:34 PM
Quiddity updated the task description. (Show Details)

Probably because we need to whitelist it or something. Gitiles takes a very cautious approach to rendering markdown iirc.

It would be nice to resurrect git.wikimedia.org and use it for Gitiles (and treat it as a mostly untrusted domain); that would make caution much less necessary.

Indeed. Two things:

  1. Right now we use it for old Gitblit url redirections. Phab handles this.
  2. Sadly, exposing it on a separate domain means we can't enforce the ACL stuff (cookie domain/plugin/etc stuff). For the most part this is no big deal, except for the non-public repos (of which we have a couple :()

(1) is pretty easy to fix with just moving where we handle the redirects/rewrites.
(2) is a little trickier. My best thought would be replication of all public repos or something and then running Gitiles as its own daemon and kill the plugin outright? If we did this, we could offload it from Gerrit and just run it off a Ganeti VM or somesuch. Would be easy to make fully redundant across DC's with LVS too! We'd sacrifice (gitiles) links for private repos, but that's a small price to pay (and only affects like 2 or 3 repos, tops).

Aside from whether we could safely and quickly replicate all references and internal branches and set up a new Gitiles server...

Whitelisting <pre> seems safe, and would need to happen even if we did set up a new server. Might even be possible to whitelist it upstream by default. While GitHub and Doxygen also support the <pre> syntax, use of that syntax is far less popular than backticks, so upstream may've simply never considered it.

Vvjjkkii renamed this task from Gitiles does not display <pre> blocks in repository descriptions to 7wdaaaaaaa.Jul 1 2018, 1:12 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot raised the priority of this task from High to Needs Triage.Jul 3 2018, 1:53 AM

From gitiles doc:

HTML

Most HTML tags are not supported. HTML will be dropped on the floor by the parser with no warnings, and no output from that section of the document.

< If markdown.safehtml is true there are small exceptions for <br>, <hr>, <a name> and <iframe> elements, ...

And indeed Phabricator requires the language to be passed as lang=php. With simply php it does not render anything:

use Wikimedia\TestingAccessWrapper;

class NonPublic {
	protected $prop;
	protected const CONSTANT = 4;
	protected function func() {}
	protected static function staticFunc() {}
}

lang=php:

use Wikimedia\TestingAccessWrapper;

class NonPublic {
	protected $prop;
	protected const CONSTANT = 4;
	protected function func() {}
	protected static function staticFunc() {}
}

Looks like we could make Phabricator learn that syntax which is widely used?

FWIW, the relevant spec is the info string in CommonMark:

The first word of the info string is typically used to specify the language of the code sample, and rendered in the class attribute of the code tag.

So I guess it is a two folds task:

A) have gitiles to whitelist <pre> and from its doc/parser it only accepts a, hr, br and iframe (which gets sanitized). I had a quick glance at gitiles code and they do use org.commonmark as a base class to process markdown, they have a postprocessor that explicitly accepts the previously mentioned tag. Maybe we can file that request Upstream? Probably at https://bugs.chromium.org/p/gerrit/issues/list?q=component%3Aplugins%3Egitiles&can=2

B) make Phabricator to respect the CommonMark spec and accepts php as a language hint. Seems to be in the following code:

src/infrastructure/markup/blockrule/PhutilRemarkupCodeBlockRule.php
  3 final class PhutilRemarkupCodeBlockRule extends PhutilRemarkupBlockRule {        
...
 46   public function markupText($text, $children) {
...
 59     $options = array(
 60       'counterexample'  => false,
 61       'lang'            => null,
 62       'name'            => null,
 63       'lines'           => null,
 64     );
 65
 66     $parser = new PhutilSimpleOptions();
 67     $custom = $parser->parse(head($lines));
 68     if ($custom) {
 69       $valid = true;
 70       foreach ($custom as $key => $value) {
// Rejects any option not listed in $options eg a plain 'php' is rejected:
 71         if (!array_key_exists($key, $options)) {
 72           $valid = false;
 73           break;
 74         }
 75       }
 76       if ($valid) {
 77         array_shift($lines);
 78         $options = $custom + $options;
 79       }
 80     }
// tries to guess language
119     if (empty($options['lang'])) {
120       // If the user hasn't specified "lang=..." explicitly, try to guess the
121       // language. If we fail, fall back to configured defaults.
122       $lang = PhutilLanguageGuesser::guessLanguage($text);
123       if (!$lang) {
124         $lang = nonempty(
125           $this->getEngine()->getConfig('phutil.codeblock.language-default'),
126           'text');
127       }
128       $options['lang'] = $lang;
129     }

The code only accepts options counterexample, lang, name, lines. Potentially any extra key could be looked up in the syntax highlighting engine and check whether it is a supported language, if so set lang. Or alternatively just blindly pass it as-is.

But maybe that Phabricator fix should be forked to a sub task.

Summary from my perspective:

The preferred Markdown syntax is "```lang", which is supported in most places except Phabricator but that doesn't matter since we aren't moving code hosting into Phab and Gitiles is the canonical code browser. Gerrit, Gitiles, GitHub, GitLab etc all support this syntax. It is also supported in JSDoc and Doxgen.

The alternative syntax of <pre lang=""> is custom to GitHub afaik, although Doxygen is capable of rendering it as plain PRE whilst tolerating/ignoring the lang attribute, but indeed it is broken in Gitiles.

Seems to me the solution is simple, to just use the Markdown syntax. In Phabricator comments and tasks we use Remarkup, which is intentionally designed to be different from Markdown and that's unlikely to change, but also doesn't really conflict with markdown files in repos.

I switched some READMEs to use <pre> in the past because at the time we were shifting to use Phabricator's code viewer and/or hosting, but that has since been abandoned. I suggest we decline this ticket as such.